Part IV Data Path and Control. Feb Computer Architecture, Data Path and Control Slide 1

Size: px
Start display at page:

Download "Part IV Data Path and Control. Feb Computer Architecture, Data Path and Control Slide 1"

Transcription

1 Part IV Path and Control Feb. Computer Architecture, Path and Control Slide

2 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture: From Microprocessors to Supercomputers, Oxford University Press, 5, ISBN X. It is updated regularly by the author as part of his teaching of the upper-division course ECE 54, Introduction to Computer Architecture, at the University of California, Santa Barbara. uctors can use these slides freely in classroom teaching and for other educational purposes. Any other use is strictly prohibited. Behrooz Parhami Edition Released Revised Revised Revised Revised First July 3 July 4 July 5 Mar. 6 Feb. 7 Feb. 8 Feb. 9 Feb. Feb. Computer Architecture, Path and Control Slide

3 A Few Words About Where We Are Headed Performance = Execution time simplified to CPU execution time CPU execution time = uctions CPI (Clock rate) Performance = Clock rate ( uctions CPI ) Try to achieve CPI = with clock that is as high as that for CPI > designs; is CPI < feasible? (Chap 5-6) Design memory & IO structures to support ultrahigh-speed CPUs (chap 7-4) Define an instruction set; make it simple enough to require a small number of cycles and allow high clock rate, but not so simple that we need many instructions, even for very simple tasks (Chap 5-8) Design hardware for CPI = ; seek improvements with CPI > (Chap 3-4) Design for arithmetic & logic ops (Chap 9-) Feb. Computer Architecture, Path and Control Slide 3

4 IV Path and Control Design a simple computer (MicroMIPS) to learn about: path part of the CPU where data signals flow Control unit guides data signals through data path Pipelining a way of achieving greater performance Topics in This Part Chapter 3 Chapter 4 Chapter 5 Chapter 6 uction Execution Steps Control Unit Synthesis Pipelined Paths Pipeline Performance Limits Feb. Computer Architecture, Path and Control Slide 4

5 3 uction Execution Steps A simple computer executes instructions one at a time Fetches an instruction from the loc pointed to by PC Interprets and executes the instruction, then repeats Topics in This Chapter 3. A Small Set of uctions 3. The uction Execution Unit 3.3 A Single-Cycle Path 3.4 Branching and Jumping 3.5 Deriving the Control Signals 3.6 Performance of the Single-Cycle Design Feb. Computer Architecture, Path and Control Slide 5

6 Fig A Small Set of uctions R 6 bits 5 bits 5 bits 5 bits I J op rs rt Opcode Source or base Source or dest n MicroMIPS instruction formats and naming of the various fields. rd jta Jump target address, 6 bits inst uction, 3 bits sh 5 bits fn 6 bits Destination Unused Opcode ext imm Operand Offset, 6 bits Seven R-format instructions (add, sub, slt, and, or, xor, nor) Six I-format instructions (lui, addi, slti, andi, ori, xori) Two I-format memory access instructions (lw, sw) Three I-format conditional branch instructions (bltz, beq, bne) Four unconditional jump instructions (j, jr, jal, syscall) Feb. Computer Architecture, Path and Control Slide 6

7 The MicroMIPS uction Set Arithmetic Logic Memory access Control transfer Table 3. Copy uction Usage Load upper immediate lui rt,imm Add add rd,rs,rt Subtract sub rd,rs,rt Set less than slt rd,rs,rt Add immediate addi rt,rs,imm Set less than immediate slti rt,rs,imm AND and rd,rs,rt OR or rd,rs,rt XOR xor rd,rs,rt NOR nor rd,rs,rt AND immediate andi rt,rs,imm OR immediate ori rt,rs,imm XOR immediate xori rt,rs,imm Load word lw rt,imm(rs) Store word sw rt,imm(rs) Jump j L Jump register jr rs Branch less than bltz rs,l Branch equal beq rs,rt,l Branch not equal bne rs,rt,l Jump and link jal L System call syscall op fn Feb. Computer Architecture, Path and Control Slide 7

8 3. The uction Execution Unit PC syscall Next addr inst beq,bne jta j,jal rs,rt,rd bltz,jr (rs) (rt) op rs rt rd sh fn R 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I J Opcode Source or base AL, lui, lw,sw Source or dest n jta Jump target address, 6 bits inst uction, 3 bits Destination Unused Opcode ext imm Operand Offset, 6 bits Address instructions imm op fn Control Harvard architecture Fig. 3. Abstract view of the instruction execution unit for MicroMIPS. For naming of instruction fields, see Fig. 3.. Feb. Computer Architecture, Path and Control Slide 8

9 3.3 A Single-Cycle Path Incr PC Next PC PC Next addr (PC) inst jta rd 3 imm rs rt 6 (rs) (rt) 3 SE Ovfl Ovfl Func out addr in ister writeback out op fn ister input Br&Jump Dst Write Src Func Read Write InSrc uction fetch access decode operation access Fig. 3.3 Key elements of the single-cycle MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 9

10 Constant amount Variable amount Const Var 5 Amount Shift function Shifter No shift Logical left Logical right Arith right Function class lui Shift Set less Arithmetic Logic An for MicroMIPS x y 5 LSBs Shifted y 3 3 Add Sub k c Adder c c 3 3 imm x y 3 or MSB 3 3 s x Shorthand symbol for Control Func s AND OR XOR NOR Logic unit Logic function Fig..9 A multifunction with 8 control signals ( for function class, arithmetic, 3 shift, logic) specifying the operation. 3- input NOR Feb. Computer Architecture, Path and Control Slide Zero Ovfl y Ovfl Zero

11 3.4 Branching and Jumping Update options for PC (PC) 3: + (PC) 3: + + imm (PC) 3:8 jta (rs) 3: SysCallAddr Default option When instruction is branch and condition is met When instruction is j or jal When the instruction is jr Start address of an operating system routine IncrPC 3 NextPC 3 Lowest bits of PC always Adder 4 MSBs c in SysCallAddr MSBs BrTrue 3 6 SE imm Branch condition checker 3 MSBs (rt) (rs) (PC) jta 3: PCSrc BrType Fig. 3.4 Next-address logic for MicroMIPS (see top part of Fig. 3.3). Feb. Computer Architecture, Path and Control Slide

12 3.5 Deriving the Control Signals Table 3. Control signals for the single-cycle MicroMIPS implementation. Next addr Control signal 3 Write Don t write Write Dst, Dst rt rd $3 InSrc, InSrc out out IncrPC Src (rt ) imm Add Sub Add Subtract LogicFn, LogicFn AND OR XOR NOR FnClass, FnClass lui Set less Arithmetic Logic Read Don t read Read Write Don t write Write BrType, BrType No branch beq bne bltz PCSrc, PCSrc IncrPC jta (rs) SysCallAddr Feb. Computer Architecture, Path and Control Slide

13 Control Signal Settings Table 3.3 uction Load upper immediate Add Subtract Set less than Add immediate Set less than immediate AND OR XOR NOR AND immediate OR immediate XOR immediate Load word Store word Jump Jump register Branch on less than Branch on equal Branch on not equal Jump and link System call op fn Write Dst InSrc Src Add Sub LogicFn FnClass Read W rite BrType PCSrc Feb. Computer Architecture, Path and Control Slide 3

14 Control Signals in the Single-Cycle Path Incr PC Next PC Next addr jta Ovfl PC (PC) inst lui slt rd 3 imm rs rt 6 (rs) (rt) 3 SE Ovfl Func out addr in out PCSrc Br&Jump BrType op fn Dst Write x xx xx Src Func ister input Add Sub LogicFn FnClass Read Write InSrc Fig. 3.3 Key elements of the single-cycle MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 4

15 uction Decoding op fn 6 RtypeInst 6 bltzinst jinst 3 jalinst 4 beqinst 5 bneinst 8 jrinst syscallinst 8 addiinst op Decoder sltiinst andiinst oriinst xoriinst luiinst lwinst fn Decoder addinst subinst andinst orinst xorinst norinst sltinst 43 swinst Fig. 3.5 uction decoder for MicroMIPS built of two 6-to-64 decoders. Feb. Computer Architecture, Path and Control Slide 5

16 Control Signal Settings: Repeated for Reference Table 3.3 uction Load upper immediate Add Subtract Set less than Add immediate Set less than immediate AND OR XOR NOR AND immediate OR immediate XOR immediate Load word Store word Jump Jump register Branch on less than Branch on equal Branch on not equal Jump and link System call op fn Write Dst InSrc Src Add Sub LogicFn FnClass Read W rite BrType PCSrc Feb. Computer Architecture, Path and Control Slide 6

17 Control Signal Generation Auxiliary signals identifying instruction classes arithinst = addinst subinst sltinst addiinst sltiinst logicinst = andinst orinst xorinst norinst andiinst oriinst xoriinst imminst = luiinst addiinst sltiinst andiinst oriinst xoriinst Example logic expressions for control signals Write = luiinst arithinst logicinst lwinst jalinst Src = imminst lwinst swinst Add Sub = subinst sltinst sltiinst Read = lwinst PCSrc = jinst jalinst syscallinst... Control addinst subinst jinst... sltinst Feb. Computer Architecture, Path and Control Slide 7

18 A Putting It All Together Fig..9 Fig. 3.4 IncrPC 3 NextPC 3 PCSrc Adder c in 4 MSBs SysCallAddr MSBs BrTrue 3 6 SE imm Branch condition checker BrType 3 MSBs (rt) (rs) (PC) 3: jta Constant amount Variable amount x y Add Sub 5 5 Const Var Amount 5 k Shift function Shifter 5 LSBs Shifted y c Adder c c 3 3 x y 3 No shift Logical left Logical right Arith right imm or MSB Function class 3 3 s lui Shift Set less Arithmetic Logic x Shortha symb for AL Cont Fun Incr PC Next PC PC Next addr (PC) inst op fn jta rd 3 imm rs rt 6 (rs) (rt) 3 SE Fig. 3.3 Ovfl Ovfl Func out addr in ister input out AND OR XOR NOR Logic unit Logic function input NOR Zero Control Ovfl addinst subinst jinst... y O Zero sltinst Br&Jump Dst Write Src Func Read Write InSrc Feb. Computer Architecture, Path and Control Slide 8

19 Performance Estimation for Single-Cycle MicroMIPS -type P C Not used uction access ns ister read ns operation ns access ns ister write ns Total 8 ns Single-cycle clock = 5 MHz Load Store Branch (and jr) P C P C P C Not used Not used Not used Not used Jump (except jr & jal) P C Not used Not used Not used Not used Fig. 3.6 The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies. Feb. Computer Architecture, Path and Control Slide 9

20 How Good is Our Single-Cycle Design? Clock rate of 5 MHz not impressive How does this compare with current processors on the market? Not bad, where latency is concerned uction access ns ister read ns operation ns access ns ister write ns Total 8 ns Single-cycle clock = 5 MHz A.5 GHz processor with or so pipeline stages has a latency of about.4 nscycle cycles = 8 ns Throughput, however, is much better for the pipelined processor: Up to times better with single issue Perhaps up to times better with multiple issue Feb. Computer Architecture, Path and Control Slide

21 4 Control Unit Synthesis The control unit for the single-cycle design is memoryless Problematic when instructions vary greatly in complexity Multiple cycles needed when resources must be reused Topics in This Chapter 4. A Multi-Cycle Implementation 4. Choosing the Clock Cycle 4.3 The Control State Machine 4.4 Performance of the Multi-Cycle Design Feb. Computer Architecture, Path and Control Slide

22 4. A Multi-Cycle Implementation Appointment book for a dentist Single-cycle Multicycle Assume longest treatment takes one hour Feb. Computer Architecture, Path and Control Slide

23 Single-Cycle vs. Multi-Cycle MicroMIPS Clock Time needed Time allotted 3 4 Clock Time needed Time allotted 3 cycles 5 cycles 3 cycles 4 cycles 3 4 Time saved Fig. 4. Single-cycle versus multi-cycle instruction execution. Feb. Computer Architecture, Path and Control Slide 3

24 A Multi-Cycle Path PC Address Inst jta rs,rt,rd (rs) x z Cache imm (rt) y op fn Control von Neumann (Princeton) architecture Fig. 4. Abstract view of a multi-cycle instruction execution unit for MicroMIPS. For naming of instruction fields, see Fig. 3.. Feb. Computer Architecture, Path and Control Slide 4

25 Multi-Cycle Path with Control Signals Shown Three major changes relative to the single-cycle data path: PC. uction & data s combined Address Cache Inst 3. isters added for intercycle data jta 6 rd 3 rs rt imm 4 MSBs 6 3 (rs). performs double duty for address calculation (rt) 3 SE x y 4 4 x Mux y Mux 3 SysCallAddr Zero Ovfl Zero Ovfl Func z out op fn Inst MemWrite InSrc SrcX Func PCWrite MemRead IRWrite Dst Write SrcY JumpAddr Fig. 4.3 Key elements of the multi-cycle MicroMIPS data path. PCSrc Feb. Computer Architecture, Path and Control Slide 5

26 Execution Cycles Fetch & PC incr Decode & reg read oper & PC update write or mem access write for lw Table 4. Execution cycles for multi-cycle MicroMIPS uction Operations Signal settings Any Any Read out the instruction and write it into instruction register, increment PC Read out rs & rt into x & y registers, compute branch address and save in z register Perform operation and save the result in z register Add base and offset values, save in z register If (x reg) = < (y reg), set PC to branch target address Inst =, MemRead = IRWrite =, SrcX = SrcY =, Func = + PCSrc = 3, PCWrite = SrcX =, SrcY = 3 Func = + type SrcX =, SrcY = or Func: Varies LoadStore SrcX =, SrcY = Func = + Branch SrcX =, SrcY = Func=, PCSrc = PCWrite = Zero or Zero or Out 3 Jump Set PC to the target address jta, SysCallAddr, or (rs) JumpAddr = or, PCSrc = or, PCWrite = type Write back z reg into rd Dst =, InSrc = Write = Load Read memory into data reg Inst =, MemRead = Store Copy y reg into memory Inst =, MemWrite = Load Copy data register into rt Dst =, InSrc = Write = Feb. Computer Architecture, Path and Control Slide 6

27 4.3 The Control State Machine Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Notes for State 5: % for j or jal, for syscall, don t-care for other instr for j, jal, and syscall, for jr, for branches # for j, jr, jal, and syscall, Zero ( ) for beq (bne), bit 3 of out for bltz For jal, Dst =, InSrc =, Write = State Inst = MemRead = IRWrite = SrcX = SrcY = Func = + PCSrc = 3 PCWrite = State SrcX = SrcY = 3 Func = + Jump Branch lw sw State 5 SrcX = SrcY = Func = JumpAddr = % PCSrc PCWrite = # State SrcX = SrcY = Func = + sw lw State 6 Inst = MemWrite = State 3 Inst = MemRead = State 4 Dst = InSrc = Write = Start State 7 State 8 Note for State 7: Func is determined based on the op and fn fields type SrcX = SrcY = or Func = Varies Dst = or InSrc = Write = Fig. 4.4 The control state machine for multi-cycle MicroMIPS. Feb. Computer Architecture, Path and Control Slide 7

28 4.4 Performance of the Multi-Cycle Design R-type 44% 4 cycles Load 4% 5 cycles Store % 4 cycles Branch 8% 3 cycles Jump % 3 cycles -type Load P C P C Not used Contribution to CPI R-type.44 4 =.76 Load.4 5 =. Store. 4 =.48 Branch.8 3 =.54 Jump. 3 =.6 Average CPI 4.4 Store Branch (and jr) Jump (except jr & jal) P C P C P C Not used Not used Not used Not used Not used Not used Not used Not used Fig. 3.6 The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies. Feb. Computer Architecture, Path and Control Slide 8

29 How Good is Our Multi-Cycle Design? Clock rate of 5 MHz better than 5 MHz of single-cycle design, but still unimpressive How does the performance compare with current processors on the market? Not bad, where latency is concerned A.5 GHz processor with or so pipeline stages has a latency of about.4 = 8 ns Throughput, however, is much better for the pipelined processor: Up to times better with single issue Perhaps up to with multiple issue Cycle time = ns Clock rate = 5 MHz R-type 44% 4 cycles Load 4% 5 cycles Store % 4 cycles Branch 8% 3 cycles Jump % 3 cycles Contribution to CPI R-type.44 4 =.76 Load.4 5 =. Store. 4 =.48 Branch.8 3 =.54 Jump. 3 =.6 Average CPI 4.4 Feb. Computer Architecture, Path and Control Slide 9

30 5 Pipelined Paths Pipelining is now used in even the simplest of processors Same principles as assembly lines in manufacturing Unlike in assembly lines, instructions not independent Topics in This Chapter 5. Pipelining Concepts 5. Pipeline Stalls or Bubbles 5.3 Pipeline Timing and Performance 5.4 Pipelined Path Design 5.5 Pipelined Control Feb. Computer Architecture, Path and Control Slide 3

31 Feb. Computer Architecture, Path and Control Slide 3

32 Single-Cycle Path of Chapter 3 Incr PC Next PC Next addr jta Ovfl Clock rate = 5 MHz CPI = (5 MIPS) PC (PC) inst rd 3 imm rs rt 6 (rs) (rt) 3 SE Ovfl Func out addr in out op fn ister input Br&Jump Dst Write Src Func Read Write InSrc Fig. 3.3 Key elements of the single-cycle MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 3

33 Multi-Cycle Path of Chapter 4 Clock rate = 5 MHz CPI 4 ( 5 MIPS) 6 4 MSBs 3 SysCallAddr 3 PC Address Cache Inst jta rd 3 rs rt imm 6 (rs) (rt) 3 SE x y 4 4 x Mux y Mux 3 Zero Ovfl Zero Ovfl Func z out 4 3 op Inst MemWrite PCWrite MemRead IRWrite fn InSrc Dst Write SrcX SrcY Func JumpAddr PCSrc Fig. 4.3 Key elements of the multi-cycle MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 33

34 Getting the Best of Both Worlds Single-cycle: Clock rate = 5 MHz CPI = Pipelined: Clock rate = 5 MHz CPI Multi-cycle: Clock rate = 5 MHz CPI 4 Single-cycle analogy: Doctor appointments scheduled for 6 min per patient Multi-cycle analogy: Doctor appointments scheduled in 5-min increments Feb. Computer Architecture, Path and Control Slide 34

35 5. Pipelining Concepts Strategies for improving performance Use multiple independent data paths accepting several instructions that are read out at once: multiple-instruction-issue or superscalar Overlap execution of several instructions, starting the next instruction before the previous one has run to completion: (super)pipelined Start here Approval Cashier istrar ID photo Pickup Exit Fig. 5. Pipelining in the student registration process. Feb. Computer Architecture, Path and Control Slide 35

36 Pipelined uction Execution Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Time dimension Task dimension Fig. 5. Pipelining in the MicroMIPS instruction execution process. Feb. Computer Architecture, Path and Control Slide 36

37 Alternate Representations of a Pipeline Except for start-up and drainage overheads, a pipeline can execute one instruction per clock tick; IPS is dictated by the clock frequency f r a d w Cycle f f f f f f f Cycle 3 f r f a r d a w d w 3 r r r r r r r a a a a a a a Drainage region f f = Fetch r = read a = op d = access w = Writeback uction (a) Task-time diagram r f a r f d a r f w d a r w d a w d w 4 5 Start-up region Pipeline stage d d d d d d d w w w w w w w (b) Space-time diagram Fig. 5.3 Two abstract graphical representations of a 5-stage pipeline executing 7 tasks (instructions). Feb. Computer Architecture, Path and Control Slide 37

38 5. Pipeline Stalls or Bubbles First type of data dependency Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 $5 = $6 + $7 forwarding $8 = $8 + $6 $9 = $8 + $ sw $9, ($3) Fig. 5.4 Read-after-write data dependency and its possible resolution through data forwarding. Feb. Computer Architecture, Path and Control Slide 38

39 Inserting Bubbles in a Pipeline Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Writes into $8 Time dimension Bubble 3 Bubble 4 Bubble 5 Task dimension Reads from $8 Without data forwarding, three bubbles are needed to resolve a read-after-write data dependency Feb. Computer Architecture, Path and Control Slide 39

40 Second Type of Dependency Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 sw $6,... mem mem Reorder? lw $8,... mem mem Insert bubble? mem mem $9 = $8 + $ mem mem Without data forwarding, three bubbles are needed to resolve a read-after-load data dependency Fig. 5.5 Read-after-load data dependency and its possible resolution through bubble insertion and data forwarding. Feb. Computer Architecture, Path and Control Slide 4

41 Control Dependency in a Pipeline Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 $6 = $3 + $5 beq $, $,... mem mem mem mem Reorder? (delayed branch) Insert bubble? mem mem $9 = $8 + $ mem mem Assume branch resolved here Here would need - more bubbles Fig. 5.6 Control dependency due to conditional branch. Feb. Computer Architecture, Path and Control Slide 4

42 5.3 Pipeline Timing and Performance t Latching of results Function unit Stage Stage Stage 3... Stage q Stage q tq Fig. 5.7 Pipelined form of a function unit with latching overhead. Feb. Computer Architecture, Path and Control Slide 4

43 Throughput Increase in a q-stage Pipeline 8 Throughput improvement factor Ideal: t = t =.5 t =. t tq + or q + q t Number q of pipeline stages Fig. 5.8 Throughput improvement due to pipelining as a function of the number of pipeline stages for different pipelining overheads. Feb. Computer Architecture, Path and Control Slide 43

44 5.4 Pipelined Path Design PC Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Address inst rs (rs) Ovfl rt Incr IncrPC imm SE rt rd 3 (rt) Func SeqInst op Br&Jump Dst Read RetAddr fn Write Src Func Write InSrc Fig. 5.9 Key elements of the pipelined MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 44

45 5.5 Pipelined Control PC Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Address inst rs (rs) Ovfl rt Incr IncrPC imm SE rt rd 3 (rt) 5 Func 3 SeqInst op Br&Jump Dst fn Write Src Func Read RetAddr Write InSrc Fig. 5. Pipelined control signals. Feb. Computer Architecture, Path and Control Slide 45

46 6 Pipeline Performance Limits Pipeline performance limited by data & control dependencies Hardware provisions: data forwarding, branch prediction Software remedies: delayed branch, instruction reordering Topics in This Chapter 6. Dependencies and Hazards 6. Forwarding 6.3 Pipeline Branch Hazards 6.4 Delayed Branch and Branch Prediction 6.5 Advanced Pipelining Feb. Computer Architecture, Path and Control Slide 46

47 6. Dependencies and Hazards Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 $ = $ - $3 uctions that read register $ Fig. 6. dependency in a pipeline. Feb. Computer Architecture, Path and Control Slide 47

48 Resolving Dependencies via Forwarding Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 $ = $ - $3 uctions that read register $ Fig. 6. When a previous instruction writes back a value computed by the into a register, the data dependency can always be resolved through forwarding. Feb. Computer Architecture, Path and Control Slide 48

49 Pipelined MicroMIPS Repeated for Reference PC Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Address inst rs (rs) Ovfl rt Incr IncrPC imm SE rt rd 3 (rt) 5 Func 3 SeqInst op Br&Jump Dst fn Write Src Func Read RetAddr Write InSrc Fig. 5. Pipelined control signals. Feb. Computer Architecture, Path and Control Slide 49

50 Certain Dependencies Lead to Bubbles Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 lw $,4($) uctions that read register $ Fig. 6.3 When the immediately preceding instruction writes a value read out from the data memory into a register, the data dependency cannot be resolved through forwarding (i.e., we cannot go back in time) and a bubble must be inserted in the pipeline. Feb. Computer Architecture, Path and Control Slide 5

51 6. Forwarding rs rt Stage SE Src (rs) (rt) s x y t x3 y3 x4 y4 x3 y3 x4 y4 Src Forw arding unit, upper Forw arding unit, lower Stage 3 Stage 4 Stage 5 d3 d4 RetAddr3, Write3, Write4 InSrc3, InSrc4 Ovfl Func x3 y3 d3 InSrc3 d3 Write3 d4 RetAddr3 RetAddr3, Write3, Write4 InSrc3, InSrc4 x4 y4 d4 InSrc4 Write4 Fig. 6.4 Forwarding unit for the pipelined MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 5

52 Design of the Forwarding Units Let s focus on designing the upper data forwarding unit Table 6. Partial truth table for the upper forwarding unit in the pipelined MicroMIPS data path. rs rt Fig. 6.4 Stage SE Src (rs) (rt) s x y t x3 y3 x4 y4 x3 y3 x4 y4 Src Forw arding unit, upper Forw arding unit, lower Stage 3 Stage 4 Stage 5 d3 d4 RetAddr3, Write3, Write4 InSrc3, InSrc4 Ovfl Func d4 InSrc4 Write4 Forwarding unit for the pipelined MicroMIPS data path. x3 y3 d3 InSrc3 d3 Write3 d4 RetAddr3 RetAddr3, Write3, Write4 InSrc3, InSrc4 x4 y4 Write3 Write4 smatchesd3 smatchesd4 RetAddr3 InSrc3 InSrc4 Choose x x x x x x x x x x x x x x x4 x x x y4 x x x3 x x y3 x x3 Incorrect in textbook Feb. Computer Architecture, Path and Control Slide 5

53 Augmentations to Pipelined Path and Control Branch predictor Next addr forwarders Hazard detector forwarders forwarder PC Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Address inst rs (rs) Ovfl rt Incr IncrPC imm SE rt rd 3 (rt) 5 Func 3 SeqInst Fig. 5. op Br&Jump Dst fn Write Src Func Read RetAddr Write InSrc Feb. Computer Architecture, Path and Control Slide 53

54 6.3 Pipeline Branch Hazards Software-based solutions Compiler inserts a no-op after every branch (simple, but wasteful) Branch is redefined to take effect after the instruction that follows it Branch delay slot(s) are filled with useful instructions via reordering Hardware-based solutions Mechanism similar to data hazard detector to flush the pipeline Constitutes a rudimentary form of branch prediction: Always predict that the branch is not taken, flush if mistaken More elaborate branch prediction strategies possible Feb. Computer Architecture, Path and Control Slide 54

55 6.4 Branch Prediction Predicting whether a branch will be taken Always predict that the branch will not be taken Use program context to decide (backward branch is likely taken, forward branch is likely not taken) Allow programmer or compiler to supply clues Decide based on past history (maintain a small history table); to be discussed later Apply a combination of factors: modern processors use elaborate techniques due to deep pipelines Feb. Computer Architecture, Path and Control Slide 55

56 Forward and Backward Branches List A is stored in memory beginning at the address given in $s. List length is given in $s. Find the largest integer in the list and copy it into $t. Solution Example 5.5 Scan the list, holding the largest element identified thus far in $t. lw $t,($s) # initialize maximum to A[] addi $t,$zero, # initialize index i to loop: add $t,$t, # increment index i by beq $t,$s,done # if all elements examined, quit add $t,$t,$t # compute i in $t add $t,$t,$t # compute 4i in $t add $t,$t,$s # form address of A[i] in $t lw $t3,($t) # load value of A[i] into $t3 slt $t4,$t,$t3 # maximum < A[i]? beq $t4,$zero,loop # if not, repeat with no change addi $t,$t3, # if so, A[i] is the new maximum j loop # change completed; now repeat done:... # continuation of the program Feb. Computer Architecture, Path and Control Slide 56

57 Hardware Implementation of Branch Prediction Low-order bits used as index Addresses of recent branch instructions Target addresses History bit(s) Incremented PC Read-out table entry Next PC From PC Compare = Logic Fig. 6.7 Hardware elements for a branch prediction scheme. The mapping scheme used to go from PC contents to a table entry is the same as that used in direct-mapped s (Chapter 8) Feb. Computer Architecture, Path and Control Slide 57

58 6.5 Advanced Pipelining Deep pipeline = superpipeline; also, superpipelined, superpipelining Parallel instruction issue = superscalar, j-way issue (-4 is typical) Stage Stage Stage 3 Stage 4 Stage 5 Variable # of stages Stage q Stage q Stage q Function unit decode Operand prep issue Function unit Function unit 3 uction fetch Retirement & commit stages Fig. 6.8 Dynamic instruction pipeline with in-order issue, possible out-of-order completion, and in-order retirement. Feb. Computer Architecture, Path and Control Slide 58

59 CPI Variations with Architectural Features Table 6. Effect of processor architecture, branch prediction methods, and speculative execution on CPI. Architecture Methods used in practice CPI Nonpipelined, multicycle Strict in-order instruction issue and exec 5- Nonpipelined, overlapped In-order issue, with multiple function units 3-5 Pipelined, static In-order exec, simple branch prediction -3 Superpipelined, dynamic Out-of-order exec, adv branch prediction - Superscalar - to 4-way issue, interlock & speculation.5- Advanced superscalar 4- to 8-way issue, aggressive speculation inst cycle 3 Gigacycles s GIPS Feb. Computer Architecture, Path and Control Slide 59

60 The Shrinking Supercomputer Feb. Computer Architecture, Path and Control Slide 6

61 The Three Hardware Designs for MicroMIPS Incr PC Next PC PC Br&Jump Next addr (PC) inst op 5 MHz CPI = fn jta rd 3 imm rs rt 6 Dst Write PC 5 MHz CPI. (rs) (rt) 3 SE Single-cycle Ovfl Ovfl Func Src Func out addr in ister input Read Write out InSrc Multi-cycle PC PCWrite Address Inst Cache MemRead MemWrite Inst IRWrite op jta fn 6 rd 3 Dst rs rt imm 4 MSBs 6 InSrc 3 Write (rs) (rt) 3 SE x y SrcX Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Incr inst IncrPC rs rt imm SE rt rd 3 (rs) (rt) 5 Ovfl Func 3 Address x Mux y Mux SrcY SysCallAddr Zero Ovfl Zero Ovfl Func Func z out JumpAddr PCSrc 5 MHz CPI 4 SeqInst op Br&Jump Dst fn Write Src Func Read RetAddr Write InSrc Feb. Computer Architecture, Path and Control Slide 6

62 Where Do We Go from Here? Memory Design: How to build a memory unit that responds in clock Input and Output: Peripheral devices, IO programming, interfacing, interrupts Higher Performance: Vectorarray processing Parallel processing Feb. Computer Architecture, Path and Control Slide 6

Array vs. Linked list Slowly changing size, order More often dynamically y( (rarely statically)

Array vs. Linked list Slowly changing size, order More often dynamically y( (rarely statically) Array vs. Linked list Quickly changing size, order Slowly changing size, order More often dynamically y( (rarely statically) Could be allocated dynamically or statically Could be contiguous (when static)

More information

Unpipelined Multicycle Datapath Implementation

Unpipelined Multicycle Datapath Implementation Unpipelined Multicycle Datapath Implementation Adapted from instructor s supplementary material from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 8, MK] and Computer Architecture:

More information

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE Debdeep Mukhopadhyay, CSE, IIT Kharagpur Instruction Execution Steps: The Single Cycle Circuit 1 The Micro Mips ISA The Instruction Format op rs rt rd sh

More information

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE. Debdeep Mukhopadhyay, CSE, IIT Kharagpur. Instructions and Addressing

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE. Debdeep Mukhopadhyay, CSE, IIT Kharagpur. Instructions and Addressing CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE Debdeep Mukhopadhyay, CSE, IIT Kharagpur Instructions and Addressing 1 ISA vs. Microarchitecture An ISA or Instruction Set Architecture describes the aspects

More information

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE Debdeep Mukhopadhyay, CSE, IIT Kharagpur Instruction Execution Steps: The Multi Cycle Circuit 1 The Micro Mips ISA The Instruction Format op rs rt rd sh fn

More information

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1 Part II Instruction-Set Architecture Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1 Short review of the previous lecture Performance = 1/(Execution time) = Clock rate / (Average CPI

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CENG 3420 Lecture 06: Datapath

CENG 3420 Lecture 06: Datapath CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1 Part II Instruction-Set Architecture Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm

More information

RISC Processor Design

RISC Processor Design RISC Processor Design Single Cycle Implementation - MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 13 SE-273: Processor Design Feb 07, 2011 SE-273@SERC 1 Courtesy:

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 20 SE-273: Processor Design Courtesy: Prof. Vishwani Agrawal

More information

Part V Memory System Design. Feb Computer Architecture, Memory System Design Slide 1

Part V Memory System Design. Feb Computer Architecture, Memory System Design Slide 1 Part V Memory System Design Feb. 2011 Computer Architecture, Memory System Design Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture:

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

Part III The Arithmetic/Logic Unit. Oct Computer Architecture, The Arithmetic/Logic Unit Slide 1

Part III The Arithmetic/Logic Unit. Oct Computer Architecture, The Arithmetic/Logic Unit Slide 1 Part III The Arithmetic/Logic Unit Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture:

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

CS232 Final Exam May 5, 2001

CS232 Final Exam May 5, 2001 CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CSEN 601: Computer System Architecture Summer 2014

CSEN 601: Computer System Architecture Summer 2014 CSEN 601: Computer System Architecture Summer 2014 Practice Assignment 5 Solutions Exercise 5-1: (Midterm Spring 2013) a. What are the values of the control signals (except ALUOp) for each of the following

More information

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

Review: Abstract Implementation View

Review: Abstract Implementation View Review: Abstract Implementation View Split memory (Harvard) model - single cycle operation Simplified to contain only the instructions: memory-reference instructions: lw, sw arithmetic-logical instructions:

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1 Part II Instruction-Set Architecture Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1 MiniMIPS Instruction Formats op rs rt 31 25 2 15 1 5 R 6 bits 5 bits 5 bits 5 bits I J Opcode Source

More information

The overall datapath for RT, lw,sw beq instrucution

The overall datapath for RT, lw,sw beq instrucution Designing The Main Control Unit: Remember the three instruction classes {R-type, Memory, Branch}: a) R-type : Op rs rt rd shamt funct 1.src 2.src dest. 31-26 25-21 20-16 15-11 10-6 5-0 a) Memory : Op rs

More information

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4 Processor Han Wang CS3410, Spring 2012 Computer Science Cornell University See P&H Chapter 2.16 20, 4.1 4 Announcements Project 1 Available Design Document due in one week. Final Design due in three weeks.

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

EE 457 Unit 6a. Basic Pipelining Techniques

EE 457 Unit 6a. Basic Pipelining Techniques EE 47 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink bottling plant Filling the bottle = 3 sec. Placing the cap = 3 sec. Labeling = 3 sec. Would you want Machine = Does

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA CISC 662 Graduate Computer Architecture Lecture 4 - ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

CPE 335. Basic MIPS Architecture Part II

CPE 335. Basic MIPS Architecture Part II CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

RISC Design: Multi-Cycle Implementation

RISC Design: Multi-Cycle Implementation RISC Design: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

LECTURE 10. Pipelining: Advanced ILP

LECTURE 10. Pipelining: Advanced ILP LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

More information

Concocting an Instruction Set

Concocting an Instruction Set Concocting an Instruction Set Nerd Chef at work. move flour,bowl add milk,bowl add egg,bowl move bowl,mixer rotate mixer... Read: Chapter 2.1-2.7 L03 Instruction Set 1 A General-Purpose Computer The von

More information

Major CPU Design Steps

Major CPU Design Steps Datapath Major CPU Design Steps. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected

More information

CPU Organization (Design)

CPU Organization (Design) ISA Requirements CPU Organization (Design) Datapath Design: Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions (e.g., Registers, ALU, Shifters, Logic

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

MIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support

MIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support Components of an ISA EE 357 Unit 11 MIPS ISA 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support SUBtract instruc. vs. NEGate + ADD instrucs. 3. Registers accessible

More information

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU CS6C L9 CPU Design : Designing a Single-Cycle CPU () insteecsberkeleyedu/~cs6c CS6C : Machine Structures Lecture #9 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker Review

More information

Instruction Set Architecture part 1 (Introduction) Mehran Rezaei

Instruction Set Architecture part 1 (Introduction) Mehran Rezaei Instruction Set Architecture part 1 (Introduction) Mehran Rezaei Overview Last Lecture s Review Execution Cycle Levels of Computer Languages Stored Program Computer/Instruction Execution Cycle SPIM, a

More information

--------------------------------------------------------------------------------------------------------------------- 1. Objectives: Using the Logisim simulator Designing and testing a Pipelined 16-bit

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (1) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CPE 335 Computer Organization. Basic MIPS Architecture Part I CPE 335 Computer Organization Basic MIPS Architecture Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s8/index.html CPE232 Basic MIPS Architecture

More information

Laboratory Exercise 6 Pipelined Processors 0.0

Laboratory Exercise 6 Pipelined Processors 0.0 Laboratory Exercise 6 Pipelined Processors 0.0 Goals After this laboratory exercise, you should understand the basic principles of how pipelining works, including the problems of data and branch hazards

More information

MIPS Assembly Programming

MIPS Assembly Programming COMP 212 Computer Organization & Architecture COMP 212 Fall 2008 Lecture 8 Cache & Disk System Review MIPS Assembly Programming Comp 212 Computer Org & Arch 1 Z. Li, 2008 Comp 212 Computer Org & Arch 2

More information

Character Is a byte quantity (00~FF or 0~255) ASCII (American Standard Code for Information Interchange) Page 91, Fig. 2.21

Character Is a byte quantity (00~FF or 0~255) ASCII (American Standard Code for Information Interchange) Page 91, Fig. 2.21 2.9 Communication with People: Byte Data & Constants Character Is a byte quantity (00~FF or 0~255) ASCII (American Standard Code for Information Interchange) Page 91, Fig. 2.21 32: space 33:! 34: 35: #...

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined

More information

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining ECE 3056: Architecture, Concurrency, and Energy of Computation Sample Problem Sets: Pipelining 1. Consider the following code sequence used often in block copies. This produces a special form of the load

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CC 311- Computer Architecture. The Processor - Control

CC 311- Computer Architecture. The Processor - Control CC 311- Computer Architecture The Processor - Control Control Unit Functions: Instruction code Control Unit Control Signals Select operations to be performed (ALU, read/write, etc.) Control data flow (multiplexor

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

INSTRUCTION SET COMPARISONS

INSTRUCTION SET COMPARISONS INSTRUCTION SET COMPARISONS MIPS SPARC MOTOROLA REGISTERS: INTEGER 32 FIXED WINDOWS 32 FIXED FP SEPARATE SEPARATE SHARED BRANCHES: CONDITION CODES NO YES NO COMPARE & BR. YES NO YES A=B COMP. & BR. YES

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA

More information

Lecture 2: Processor and Pipelining 1

Lecture 2: Processor and Pipelining 1 The Simple BIG Picture! Chapter 3 Additional Slides The Processor and Pipelining CENG 6332 2 Datapath vs Control Datapath signals Control Points Controller Datapath: Storage, FU, interconnect sufficient

More information

What do we have so far? Multi-Cycle Datapath (Textbook Version)

What do we have so far? Multi-Cycle Datapath (Textbook Version) What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001

More information

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours. This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization CISC 662 Graduate Computer Architecture Lecture 4 - ISA MIPS ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Chapter 4 The Processor

Chapter 4 The Processor Chapter 4 The Processor 4.1 Introduction 4.2 Logic Design Conventions 4.3 The Single-Cycle Design 4.4 The Pipelined Design (c) Kevin R. Burger :: Computer Science & Engineering :: Arizona State University

More information

ENE 334 Microprocessors

ENE 334 Microprocessors ENE 334 Microprocessors Lecture 6: Datapath and Control : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 3 th & 4 th Edition, Patterson & Hennessy, 2005/2008, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) Microarchitecture Design of Digital Circuits 27 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_7 Adapted from Digital

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information