Part IV Data Path and Control. Feb Computer Architecture, Data Path and Control Slide 1
|
|
- Suzanna Conley
- 6 years ago
- Views:
Transcription
1 Part IV Path and Control Feb. Computer Architecture, Path and Control Slide
2 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture: From Microprocessors to Supercomputers, Oxford University Press, 5, ISBN X. It is updated regularly by the author as part of his teaching of the upper-division course ECE 54, Introduction to Computer Architecture, at the University of California, Santa Barbara. uctors can use these slides freely in classroom teaching and for other educational purposes. Any other use is strictly prohibited. Behrooz Parhami Edition Released Revised Revised Revised Revised First July 3 July 4 July 5 Mar. 6 Feb. 7 Feb. 8 Feb. 9 Feb. Feb. Computer Architecture, Path and Control Slide
3 A Few Words About Where We Are Headed Performance = Execution time simplified to CPU execution time CPU execution time = uctions CPI (Clock rate) Performance = Clock rate ( uctions CPI ) Try to achieve CPI = with clock that is as high as that for CPI > designs; is CPI < feasible? (Chap 5-6) Design memory & IO structures to support ultrahigh-speed CPUs (chap 7-4) Define an instruction set; make it simple enough to require a small number of cycles and allow high clock rate, but not so simple that we need many instructions, even for very simple tasks (Chap 5-8) Design hardware for CPI = ; seek improvements with CPI > (Chap 3-4) Design for arithmetic & logic ops (Chap 9-) Feb. Computer Architecture, Path and Control Slide 3
4 IV Path and Control Design a simple computer (MicroMIPS) to learn about: path part of the CPU where data signals flow Control unit guides data signals through data path Pipelining a way of achieving greater performance Topics in This Part Chapter 3 Chapter 4 Chapter 5 Chapter 6 uction Execution Steps Control Unit Synthesis Pipelined Paths Pipeline Performance Limits Feb. Computer Architecture, Path and Control Slide 4
5 3 uction Execution Steps A simple computer executes instructions one at a time Fetches an instruction from the loc pointed to by PC Interprets and executes the instruction, then repeats Topics in This Chapter 3. A Small Set of uctions 3. The uction Execution Unit 3.3 A Single-Cycle Path 3.4 Branching and Jumping 3.5 Deriving the Control Signals 3.6 Performance of the Single-Cycle Design Feb. Computer Architecture, Path and Control Slide 5
6 Fig A Small Set of uctions R 6 bits 5 bits 5 bits 5 bits I J op rs rt Opcode Source or base Source or dest n MicroMIPS instruction formats and naming of the various fields. rd jta Jump target address, 6 bits inst uction, 3 bits sh 5 bits fn 6 bits Destination Unused Opcode ext imm Operand Offset, 6 bits Seven R-format instructions (add, sub, slt, and, or, xor, nor) Six I-format instructions (lui, addi, slti, andi, ori, xori) Two I-format memory access instructions (lw, sw) Three I-format conditional branch instructions (bltz, beq, bne) Four unconditional jump instructions (j, jr, jal, syscall) Feb. Computer Architecture, Path and Control Slide 6
7 The MicroMIPS uction Set Arithmetic Logic Memory access Control transfer Table 3. Copy uction Usage Load upper immediate lui rt,imm Add add rd,rs,rt Subtract sub rd,rs,rt Set less than slt rd,rs,rt Add immediate addi rt,rs,imm Set less than immediate slti rt,rs,imm AND and rd,rs,rt OR or rd,rs,rt XOR xor rd,rs,rt NOR nor rd,rs,rt AND immediate andi rt,rs,imm OR immediate ori rt,rs,imm XOR immediate xori rt,rs,imm Load word lw rt,imm(rs) Store word sw rt,imm(rs) Jump j L Jump register jr rs Branch less than bltz rs,l Branch equal beq rs,rt,l Branch not equal bne rs,rt,l Jump and link jal L System call syscall op fn Feb. Computer Architecture, Path and Control Slide 7
8 3. The uction Execution Unit PC syscall Next addr inst beq,bne jta j,jal rs,rt,rd bltz,jr (rs) (rt) op rs rt rd sh fn R 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I J Opcode Source or base AL, lui, lw,sw Source or dest n jta Jump target address, 6 bits inst uction, 3 bits Destination Unused Opcode ext imm Operand Offset, 6 bits Address instructions imm op fn Control Harvard architecture Fig. 3. Abstract view of the instruction execution unit for MicroMIPS. For naming of instruction fields, see Fig. 3.. Feb. Computer Architecture, Path and Control Slide 8
9 3.3 A Single-Cycle Path Incr PC Next PC PC Next addr (PC) inst jta rd 3 imm rs rt 6 (rs) (rt) 3 SE Ovfl Ovfl Func out addr in ister writeback out op fn ister input Br&Jump Dst Write Src Func Read Write InSrc uction fetch access decode operation access Fig. 3.3 Key elements of the single-cycle MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 9
10 Constant amount Variable amount Const Var 5 Amount Shift function Shifter No shift Logical left Logical right Arith right Function class lui Shift Set less Arithmetic Logic An for MicroMIPS x y 5 LSBs Shifted y 3 3 Add Sub k c Adder c c 3 3 imm x y 3 or MSB 3 3 s x Shorthand symbol for Control Func s AND OR XOR NOR Logic unit Logic function Fig..9 A multifunction with 8 control signals ( for function class, arithmetic, 3 shift, logic) specifying the operation. 3- input NOR Feb. Computer Architecture, Path and Control Slide Zero Ovfl y Ovfl Zero
11 3.4 Branching and Jumping Update options for PC (PC) 3: + (PC) 3: + + imm (PC) 3:8 jta (rs) 3: SysCallAddr Default option When instruction is branch and condition is met When instruction is j or jal When the instruction is jr Start address of an operating system routine IncrPC 3 NextPC 3 Lowest bits of PC always Adder 4 MSBs c in SysCallAddr MSBs BrTrue 3 6 SE imm Branch condition checker 3 MSBs (rt) (rs) (PC) jta 3: PCSrc BrType Fig. 3.4 Next-address logic for MicroMIPS (see top part of Fig. 3.3). Feb. Computer Architecture, Path and Control Slide
12 3.5 Deriving the Control Signals Table 3. Control signals for the single-cycle MicroMIPS implementation. Next addr Control signal 3 Write Don t write Write Dst, Dst rt rd $3 InSrc, InSrc out out IncrPC Src (rt ) imm Add Sub Add Subtract LogicFn, LogicFn AND OR XOR NOR FnClass, FnClass lui Set less Arithmetic Logic Read Don t read Read Write Don t write Write BrType, BrType No branch beq bne bltz PCSrc, PCSrc IncrPC jta (rs) SysCallAddr Feb. Computer Architecture, Path and Control Slide
13 Control Signal Settings Table 3.3 uction Load upper immediate Add Subtract Set less than Add immediate Set less than immediate AND OR XOR NOR AND immediate OR immediate XOR immediate Load word Store word Jump Jump register Branch on less than Branch on equal Branch on not equal Jump and link System call op fn Write Dst InSrc Src Add Sub LogicFn FnClass Read W rite BrType PCSrc Feb. Computer Architecture, Path and Control Slide 3
14 Control Signals in the Single-Cycle Path Incr PC Next PC Next addr jta Ovfl PC (PC) inst lui slt rd 3 imm rs rt 6 (rs) (rt) 3 SE Ovfl Func out addr in out PCSrc Br&Jump BrType op fn Dst Write x xx xx Src Func ister input Add Sub LogicFn FnClass Read Write InSrc Fig. 3.3 Key elements of the single-cycle MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 4
15 uction Decoding op fn 6 RtypeInst 6 bltzinst jinst 3 jalinst 4 beqinst 5 bneinst 8 jrinst syscallinst 8 addiinst op Decoder sltiinst andiinst oriinst xoriinst luiinst lwinst fn Decoder addinst subinst andinst orinst xorinst norinst sltinst 43 swinst Fig. 3.5 uction decoder for MicroMIPS built of two 6-to-64 decoders. Feb. Computer Architecture, Path and Control Slide 5
16 Control Signal Settings: Repeated for Reference Table 3.3 uction Load upper immediate Add Subtract Set less than Add immediate Set less than immediate AND OR XOR NOR AND immediate OR immediate XOR immediate Load word Store word Jump Jump register Branch on less than Branch on equal Branch on not equal Jump and link System call op fn Write Dst InSrc Src Add Sub LogicFn FnClass Read W rite BrType PCSrc Feb. Computer Architecture, Path and Control Slide 6
17 Control Signal Generation Auxiliary signals identifying instruction classes arithinst = addinst subinst sltinst addiinst sltiinst logicinst = andinst orinst xorinst norinst andiinst oriinst xoriinst imminst = luiinst addiinst sltiinst andiinst oriinst xoriinst Example logic expressions for control signals Write = luiinst arithinst logicinst lwinst jalinst Src = imminst lwinst swinst Add Sub = subinst sltinst sltiinst Read = lwinst PCSrc = jinst jalinst syscallinst... Control addinst subinst jinst... sltinst Feb. Computer Architecture, Path and Control Slide 7
18 A Putting It All Together Fig..9 Fig. 3.4 IncrPC 3 NextPC 3 PCSrc Adder c in 4 MSBs SysCallAddr MSBs BrTrue 3 6 SE imm Branch condition checker BrType 3 MSBs (rt) (rs) (PC) 3: jta Constant amount Variable amount x y Add Sub 5 5 Const Var Amount 5 k Shift function Shifter 5 LSBs Shifted y c Adder c c 3 3 x y 3 No shift Logical left Logical right Arith right imm or MSB Function class 3 3 s lui Shift Set less Arithmetic Logic x Shortha symb for AL Cont Fun Incr PC Next PC PC Next addr (PC) inst op fn jta rd 3 imm rs rt 6 (rs) (rt) 3 SE Fig. 3.3 Ovfl Ovfl Func out addr in ister input out AND OR XOR NOR Logic unit Logic function input NOR Zero Control Ovfl addinst subinst jinst... y O Zero sltinst Br&Jump Dst Write Src Func Read Write InSrc Feb. Computer Architecture, Path and Control Slide 8
19 Performance Estimation for Single-Cycle MicroMIPS -type P C Not used uction access ns ister read ns operation ns access ns ister write ns Total 8 ns Single-cycle clock = 5 MHz Load Store Branch (and jr) P C P C P C Not used Not used Not used Not used Jump (except jr & jal) P C Not used Not used Not used Not used Fig. 3.6 The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies. Feb. Computer Architecture, Path and Control Slide 9
20 How Good is Our Single-Cycle Design? Clock rate of 5 MHz not impressive How does this compare with current processors on the market? Not bad, where latency is concerned uction access ns ister read ns operation ns access ns ister write ns Total 8 ns Single-cycle clock = 5 MHz A.5 GHz processor with or so pipeline stages has a latency of about.4 nscycle cycles = 8 ns Throughput, however, is much better for the pipelined processor: Up to times better with single issue Perhaps up to times better with multiple issue Feb. Computer Architecture, Path and Control Slide
21 4 Control Unit Synthesis The control unit for the single-cycle design is memoryless Problematic when instructions vary greatly in complexity Multiple cycles needed when resources must be reused Topics in This Chapter 4. A Multi-Cycle Implementation 4. Choosing the Clock Cycle 4.3 The Control State Machine 4.4 Performance of the Multi-Cycle Design Feb. Computer Architecture, Path and Control Slide
22 4. A Multi-Cycle Implementation Appointment book for a dentist Single-cycle Multicycle Assume longest treatment takes one hour Feb. Computer Architecture, Path and Control Slide
23 Single-Cycle vs. Multi-Cycle MicroMIPS Clock Time needed Time allotted 3 4 Clock Time needed Time allotted 3 cycles 5 cycles 3 cycles 4 cycles 3 4 Time saved Fig. 4. Single-cycle versus multi-cycle instruction execution. Feb. Computer Architecture, Path and Control Slide 3
24 A Multi-Cycle Path PC Address Inst jta rs,rt,rd (rs) x z Cache imm (rt) y op fn Control von Neumann (Princeton) architecture Fig. 4. Abstract view of a multi-cycle instruction execution unit for MicroMIPS. For naming of instruction fields, see Fig. 3.. Feb. Computer Architecture, Path and Control Slide 4
25 Multi-Cycle Path with Control Signals Shown Three major changes relative to the single-cycle data path: PC. uction & data s combined Address Cache Inst 3. isters added for intercycle data jta 6 rd 3 rs rt imm 4 MSBs 6 3 (rs). performs double duty for address calculation (rt) 3 SE x y 4 4 x Mux y Mux 3 SysCallAddr Zero Ovfl Zero Ovfl Func z out op fn Inst MemWrite InSrc SrcX Func PCWrite MemRead IRWrite Dst Write SrcY JumpAddr Fig. 4.3 Key elements of the multi-cycle MicroMIPS data path. PCSrc Feb. Computer Architecture, Path and Control Slide 5
26 Execution Cycles Fetch & PC incr Decode & reg read oper & PC update write or mem access write for lw Table 4. Execution cycles for multi-cycle MicroMIPS uction Operations Signal settings Any Any Read out the instruction and write it into instruction register, increment PC Read out rs & rt into x & y registers, compute branch address and save in z register Perform operation and save the result in z register Add base and offset values, save in z register If (x reg) = < (y reg), set PC to branch target address Inst =, MemRead = IRWrite =, SrcX = SrcY =, Func = + PCSrc = 3, PCWrite = SrcX =, SrcY = 3 Func = + type SrcX =, SrcY = or Func: Varies LoadStore SrcX =, SrcY = Func = + Branch SrcX =, SrcY = Func=, PCSrc = PCWrite = Zero or Zero or Out 3 Jump Set PC to the target address jta, SysCallAddr, or (rs) JumpAddr = or, PCSrc = or, PCWrite = type Write back z reg into rd Dst =, InSrc = Write = Load Read memory into data reg Inst =, MemRead = Store Copy y reg into memory Inst =, MemWrite = Load Copy data register into rt Dst =, InSrc = Write = Feb. Computer Architecture, Path and Control Slide 6
27 4.3 The Control State Machine Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Notes for State 5: % for j or jal, for syscall, don t-care for other instr for j, jal, and syscall, for jr, for branches # for j, jr, jal, and syscall, Zero ( ) for beq (bne), bit 3 of out for bltz For jal, Dst =, InSrc =, Write = State Inst = MemRead = IRWrite = SrcX = SrcY = Func = + PCSrc = 3 PCWrite = State SrcX = SrcY = 3 Func = + Jump Branch lw sw State 5 SrcX = SrcY = Func = JumpAddr = % PCSrc PCWrite = # State SrcX = SrcY = Func = + sw lw State 6 Inst = MemWrite = State 3 Inst = MemRead = State 4 Dst = InSrc = Write = Start State 7 State 8 Note for State 7: Func is determined based on the op and fn fields type SrcX = SrcY = or Func = Varies Dst = or InSrc = Write = Fig. 4.4 The control state machine for multi-cycle MicroMIPS. Feb. Computer Architecture, Path and Control Slide 7
28 4.4 Performance of the Multi-Cycle Design R-type 44% 4 cycles Load 4% 5 cycles Store % 4 cycles Branch 8% 3 cycles Jump % 3 cycles -type Load P C P C Not used Contribution to CPI R-type.44 4 =.76 Load.4 5 =. Store. 4 =.48 Branch.8 3 =.54 Jump. 3 =.6 Average CPI 4.4 Store Branch (and jr) Jump (except jr & jal) P C P C P C Not used Not used Not used Not used Not used Not used Not used Not used Fig. 3.6 The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies. Feb. Computer Architecture, Path and Control Slide 8
29 How Good is Our Multi-Cycle Design? Clock rate of 5 MHz better than 5 MHz of single-cycle design, but still unimpressive How does the performance compare with current processors on the market? Not bad, where latency is concerned A.5 GHz processor with or so pipeline stages has a latency of about.4 = 8 ns Throughput, however, is much better for the pipelined processor: Up to times better with single issue Perhaps up to with multiple issue Cycle time = ns Clock rate = 5 MHz R-type 44% 4 cycles Load 4% 5 cycles Store % 4 cycles Branch 8% 3 cycles Jump % 3 cycles Contribution to CPI R-type.44 4 =.76 Load.4 5 =. Store. 4 =.48 Branch.8 3 =.54 Jump. 3 =.6 Average CPI 4.4 Feb. Computer Architecture, Path and Control Slide 9
30 5 Pipelined Paths Pipelining is now used in even the simplest of processors Same principles as assembly lines in manufacturing Unlike in assembly lines, instructions not independent Topics in This Chapter 5. Pipelining Concepts 5. Pipeline Stalls or Bubbles 5.3 Pipeline Timing and Performance 5.4 Pipelined Path Design 5.5 Pipelined Control Feb. Computer Architecture, Path and Control Slide 3
31 Feb. Computer Architecture, Path and Control Slide 3
32 Single-Cycle Path of Chapter 3 Incr PC Next PC Next addr jta Ovfl Clock rate = 5 MHz CPI = (5 MIPS) PC (PC) inst rd 3 imm rs rt 6 (rs) (rt) 3 SE Ovfl Func out addr in out op fn ister input Br&Jump Dst Write Src Func Read Write InSrc Fig. 3.3 Key elements of the single-cycle MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 3
33 Multi-Cycle Path of Chapter 4 Clock rate = 5 MHz CPI 4 ( 5 MIPS) 6 4 MSBs 3 SysCallAddr 3 PC Address Cache Inst jta rd 3 rs rt imm 6 (rs) (rt) 3 SE x y 4 4 x Mux y Mux 3 Zero Ovfl Zero Ovfl Func z out 4 3 op Inst MemWrite PCWrite MemRead IRWrite fn InSrc Dst Write SrcX SrcY Func JumpAddr PCSrc Fig. 4.3 Key elements of the multi-cycle MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 33
34 Getting the Best of Both Worlds Single-cycle: Clock rate = 5 MHz CPI = Pipelined: Clock rate = 5 MHz CPI Multi-cycle: Clock rate = 5 MHz CPI 4 Single-cycle analogy: Doctor appointments scheduled for 6 min per patient Multi-cycle analogy: Doctor appointments scheduled in 5-min increments Feb. Computer Architecture, Path and Control Slide 34
35 5. Pipelining Concepts Strategies for improving performance Use multiple independent data paths accepting several instructions that are read out at once: multiple-instruction-issue or superscalar Overlap execution of several instructions, starting the next instruction before the previous one has run to completion: (super)pipelined Start here Approval Cashier istrar ID photo Pickup Exit Fig. 5. Pipelining in the student registration process. Feb. Computer Architecture, Path and Control Slide 35
36 Pipelined uction Execution Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Time dimension Task dimension Fig. 5. Pipelining in the MicroMIPS instruction execution process. Feb. Computer Architecture, Path and Control Slide 36
37 Alternate Representations of a Pipeline Except for start-up and drainage overheads, a pipeline can execute one instruction per clock tick; IPS is dictated by the clock frequency f r a d w Cycle f f f f f f f Cycle 3 f r f a r d a w d w 3 r r r r r r r a a a a a a a Drainage region f f = Fetch r = read a = op d = access w = Writeback uction (a) Task-time diagram r f a r f d a r f w d a r w d a w d w 4 5 Start-up region Pipeline stage d d d d d d d w w w w w w w (b) Space-time diagram Fig. 5.3 Two abstract graphical representations of a 5-stage pipeline executing 7 tasks (instructions). Feb. Computer Architecture, Path and Control Slide 37
38 5. Pipeline Stalls or Bubbles First type of data dependency Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 $5 = $6 + $7 forwarding $8 = $8 + $6 $9 = $8 + $ sw $9, ($3) Fig. 5.4 Read-after-write data dependency and its possible resolution through data forwarding. Feb. Computer Architecture, Path and Control Slide 38
39 Inserting Bubbles in a Pipeline Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Writes into $8 Time dimension Bubble 3 Bubble 4 Bubble 5 Task dimension Reads from $8 Without data forwarding, three bubbles are needed to resolve a read-after-write data dependency Feb. Computer Architecture, Path and Control Slide 39
40 Second Type of Dependency Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 sw $6,... mem mem Reorder? lw $8,... mem mem Insert bubble? mem mem $9 = $8 + $ mem mem Without data forwarding, three bubbles are needed to resolve a read-after-load data dependency Fig. 5.5 Read-after-load data dependency and its possible resolution through bubble insertion and data forwarding. Feb. Computer Architecture, Path and Control Slide 4
41 Control Dependency in a Pipeline Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 $6 = $3 + $5 beq $, $,... mem mem mem mem Reorder? (delayed branch) Insert bubble? mem mem $9 = $8 + $ mem mem Assume branch resolved here Here would need - more bubbles Fig. 5.6 Control dependency due to conditional branch. Feb. Computer Architecture, Path and Control Slide 4
42 5.3 Pipeline Timing and Performance t Latching of results Function unit Stage Stage Stage 3... Stage q Stage q tq Fig. 5.7 Pipelined form of a function unit with latching overhead. Feb. Computer Architecture, Path and Control Slide 4
43 Throughput Increase in a q-stage Pipeline 8 Throughput improvement factor Ideal: t = t =.5 t =. t tq + or q + q t Number q of pipeline stages Fig. 5.8 Throughput improvement due to pipelining as a function of the number of pipeline stages for different pipelining overheads. Feb. Computer Architecture, Path and Control Slide 43
44 5.4 Pipelined Path Design PC Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Address inst rs (rs) Ovfl rt Incr IncrPC imm SE rt rd 3 (rt) Func SeqInst op Br&Jump Dst Read RetAddr fn Write Src Func Write InSrc Fig. 5.9 Key elements of the pipelined MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 44
45 5.5 Pipelined Control PC Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Address inst rs (rs) Ovfl rt Incr IncrPC imm SE rt rd 3 (rt) 5 Func 3 SeqInst op Br&Jump Dst fn Write Src Func Read RetAddr Write InSrc Fig. 5. Pipelined control signals. Feb. Computer Architecture, Path and Control Slide 45
46 6 Pipeline Performance Limits Pipeline performance limited by data & control dependencies Hardware provisions: data forwarding, branch prediction Software remedies: delayed branch, instruction reordering Topics in This Chapter 6. Dependencies and Hazards 6. Forwarding 6.3 Pipeline Branch Hazards 6.4 Delayed Branch and Branch Prediction 6.5 Advanced Pipelining Feb. Computer Architecture, Path and Control Slide 46
47 6. Dependencies and Hazards Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 $ = $ - $3 uctions that read register $ Fig. 6. dependency in a pipeline. Feb. Computer Architecture, Path and Control Slide 47
48 Resolving Dependencies via Forwarding Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 $ = $ - $3 uctions that read register $ Fig. 6. When a previous instruction writes back a value computed by the into a register, the data dependency can always be resolved through forwarding. Feb. Computer Architecture, Path and Control Slide 48
49 Pipelined MicroMIPS Repeated for Reference PC Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Address inst rs (rs) Ovfl rt Incr IncrPC imm SE rt rd 3 (rt) 5 Func 3 SeqInst op Br&Jump Dst fn Write Src Func Read RetAddr Write InSrc Fig. 5. Pipelined control signals. Feb. Computer Architecture, Path and Control Slide 49
50 Certain Dependencies Lead to Bubbles Cycle Cycle Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 lw $,4($) uctions that read register $ Fig. 6.3 When the immediately preceding instruction writes a value read out from the data memory into a register, the data dependency cannot be resolved through forwarding (i.e., we cannot go back in time) and a bubble must be inserted in the pipeline. Feb. Computer Architecture, Path and Control Slide 5
51 6. Forwarding rs rt Stage SE Src (rs) (rt) s x y t x3 y3 x4 y4 x3 y3 x4 y4 Src Forw arding unit, upper Forw arding unit, lower Stage 3 Stage 4 Stage 5 d3 d4 RetAddr3, Write3, Write4 InSrc3, InSrc4 Ovfl Func x3 y3 d3 InSrc3 d3 Write3 d4 RetAddr3 RetAddr3, Write3, Write4 InSrc3, InSrc4 x4 y4 d4 InSrc4 Write4 Fig. 6.4 Forwarding unit for the pipelined MicroMIPS data path. Feb. Computer Architecture, Path and Control Slide 5
52 Design of the Forwarding Units Let s focus on designing the upper data forwarding unit Table 6. Partial truth table for the upper forwarding unit in the pipelined MicroMIPS data path. rs rt Fig. 6.4 Stage SE Src (rs) (rt) s x y t x3 y3 x4 y4 x3 y3 x4 y4 Src Forw arding unit, upper Forw arding unit, lower Stage 3 Stage 4 Stage 5 d3 d4 RetAddr3, Write3, Write4 InSrc3, InSrc4 Ovfl Func d4 InSrc4 Write4 Forwarding unit for the pipelined MicroMIPS data path. x3 y3 d3 InSrc3 d3 Write3 d4 RetAddr3 RetAddr3, Write3, Write4 InSrc3, InSrc4 x4 y4 Write3 Write4 smatchesd3 smatchesd4 RetAddr3 InSrc3 InSrc4 Choose x x x x x x x x x x x x x x x4 x x x y4 x x x3 x x y3 x x3 Incorrect in textbook Feb. Computer Architecture, Path and Control Slide 5
53 Augmentations to Pipelined Path and Control Branch predictor Next addr forwarders Hazard detector forwarders forwarder PC Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Address inst rs (rs) Ovfl rt Incr IncrPC imm SE rt rd 3 (rt) 5 Func 3 SeqInst Fig. 5. op Br&Jump Dst fn Write Src Func Read RetAddr Write InSrc Feb. Computer Architecture, Path and Control Slide 53
54 6.3 Pipeline Branch Hazards Software-based solutions Compiler inserts a no-op after every branch (simple, but wasteful) Branch is redefined to take effect after the instruction that follows it Branch delay slot(s) are filled with useful instructions via reordering Hardware-based solutions Mechanism similar to data hazard detector to flush the pipeline Constitutes a rudimentary form of branch prediction: Always predict that the branch is not taken, flush if mistaken More elaborate branch prediction strategies possible Feb. Computer Architecture, Path and Control Slide 54
55 6.4 Branch Prediction Predicting whether a branch will be taken Always predict that the branch will not be taken Use program context to decide (backward branch is likely taken, forward branch is likely not taken) Allow programmer or compiler to supply clues Decide based on past history (maintain a small history table); to be discussed later Apply a combination of factors: modern processors use elaborate techniques due to deep pipelines Feb. Computer Architecture, Path and Control Slide 55
56 Forward and Backward Branches List A is stored in memory beginning at the address given in $s. List length is given in $s. Find the largest integer in the list and copy it into $t. Solution Example 5.5 Scan the list, holding the largest element identified thus far in $t. lw $t,($s) # initialize maximum to A[] addi $t,$zero, # initialize index i to loop: add $t,$t, # increment index i by beq $t,$s,done # if all elements examined, quit add $t,$t,$t # compute i in $t add $t,$t,$t # compute 4i in $t add $t,$t,$s # form address of A[i] in $t lw $t3,($t) # load value of A[i] into $t3 slt $t4,$t,$t3 # maximum < A[i]? beq $t4,$zero,loop # if not, repeat with no change addi $t,$t3, # if so, A[i] is the new maximum j loop # change completed; now repeat done:... # continuation of the program Feb. Computer Architecture, Path and Control Slide 56
57 Hardware Implementation of Branch Prediction Low-order bits used as index Addresses of recent branch instructions Target addresses History bit(s) Incremented PC Read-out table entry Next PC From PC Compare = Logic Fig. 6.7 Hardware elements for a branch prediction scheme. The mapping scheme used to go from PC contents to a table entry is the same as that used in direct-mapped s (Chapter 8) Feb. Computer Architecture, Path and Control Slide 57
58 6.5 Advanced Pipelining Deep pipeline = superpipeline; also, superpipelined, superpipelining Parallel instruction issue = superscalar, j-way issue (-4 is typical) Stage Stage Stage 3 Stage 4 Stage 5 Variable # of stages Stage q Stage q Stage q Function unit decode Operand prep issue Function unit Function unit 3 uction fetch Retirement & commit stages Fig. 6.8 Dynamic instruction pipeline with in-order issue, possible out-of-order completion, and in-order retirement. Feb. Computer Architecture, Path and Control Slide 58
59 CPI Variations with Architectural Features Table 6. Effect of processor architecture, branch prediction methods, and speculative execution on CPI. Architecture Methods used in practice CPI Nonpipelined, multicycle Strict in-order instruction issue and exec 5- Nonpipelined, overlapped In-order issue, with multiple function units 3-5 Pipelined, static In-order exec, simple branch prediction -3 Superpipelined, dynamic Out-of-order exec, adv branch prediction - Superscalar - to 4-way issue, interlock & speculation.5- Advanced superscalar 4- to 8-way issue, aggressive speculation inst cycle 3 Gigacycles s GIPS Feb. Computer Architecture, Path and Control Slide 59
60 The Shrinking Supercomputer Feb. Computer Architecture, Path and Control Slide 6
61 The Three Hardware Designs for MicroMIPS Incr PC Next PC PC Br&Jump Next addr (PC) inst op 5 MHz CPI = fn jta rd 3 imm rs rt 6 Dst Write PC 5 MHz CPI. (rs) (rt) 3 SE Single-cycle Ovfl Ovfl Func Src Func out addr in ister input Read Write out InSrc Multi-cycle PC PCWrite Address Inst Cache MemRead MemWrite Inst IRWrite op jta fn 6 rd 3 Dst rs rt imm 4 MSBs 6 InSrc 3 Write (rs) (rt) 3 SE x y SrcX Stage Stage Stage 3 Stage 4 Stage 5 Next addr NextPC Ovfl addr Incr inst IncrPC rs rt imm SE rt rd 3 (rs) (rt) 5 Ovfl Func 3 Address x Mux y Mux SrcY SysCallAddr Zero Ovfl Zero Ovfl Func Func z out JumpAddr PCSrc 5 MHz CPI 4 SeqInst op Br&Jump Dst fn Write Src Func Read RetAddr Write InSrc Feb. Computer Architecture, Path and Control Slide 6
62 Where Do We Go from Here? Memory Design: How to build a memory unit that responds in clock Input and Output: Peripheral devices, IO programming, interfacing, interrupts Higher Performance: Vectorarray processing Parallel processing Feb. Computer Architecture, Path and Control Slide 6
Array vs. Linked list Slowly changing size, order More often dynamically y( (rarely statically)
Array vs. Linked list Quickly changing size, order Slowly changing size, order More often dynamically y( (rarely statically) Could be allocated dynamically or statically Could be contiguous (when static)
More informationUnpipelined Multicycle Datapath Implementation
Unpipelined Multicycle Datapath Implementation Adapted from instructor s supplementary material from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 8, MK] and Computer Architecture:
More informationCS31001 COMPUTER ORGANIZATION AND ARCHITECTURE
CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE Debdeep Mukhopadhyay, CSE, IIT Kharagpur Instruction Execution Steps: The Single Cycle Circuit 1 The Micro Mips ISA The Instruction Format op rs rt rd sh
More informationCS31001 COMPUTER ORGANIZATION AND ARCHITECTURE. Debdeep Mukhopadhyay, CSE, IIT Kharagpur. Instructions and Addressing
CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE Debdeep Mukhopadhyay, CSE, IIT Kharagpur Instructions and Addressing 1 ISA vs. Microarchitecture An ISA or Instruction Set Architecture describes the aspects
More informationCS31001 COMPUTER ORGANIZATION AND ARCHITECTURE
CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE Debdeep Mukhopadhyay, CSE, IIT Kharagpur Instruction Execution Steps: The Multi Cycle Circuit 1 The Micro Mips ISA The Instruction Format op rs rt rd sh fn
More informationPart II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1
Part II Instruction-Set Architecture Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1 Short review of the previous lecture Performance = 1/(Execution time) = Clock rate / (Average CPI
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationCENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu
CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCENG 3420 Lecture 06: Datapath
CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationPart II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1
Part II Instruction-Set Architecture Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationChapter 4. The Processor. Computer Architecture and IC Design Lab
Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationProcessor (I) - datapath & control. Hwansoo Han
Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two
More informationPipelined Processor Design
Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm
More informationRISC Processor Design
RISC Processor Design Single Cycle Implementation - MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 13 SE-273: Processor Design Feb 07, 2011 SE-273@SERC 1 Courtesy:
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationPipelined Processor Design
Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 20 SE-273: Processor Design Courtesy: Prof. Vishwani Agrawal
More informationPart V Memory System Design. Feb Computer Architecture, Memory System Design Slide 1
Part V Memory System Design Feb. 2011 Computer Architecture, Memory System Design Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture:
More information4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?
Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide
More informationPart III The Arithmetic/Logic Unit. Oct Computer Architecture, The Arithmetic/Logic Unit Slide 1
Part III The Arithmetic/Logic Unit Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture:
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationCS232 Final Exam May 5, 2001
CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCSEN 601: Computer System Architecture Summer 2014
CSEN 601: Computer System Architecture Summer 2014 Practice Assignment 5 Solutions Exercise 5-1: (Midterm Spring 2013) a. What are the values of the control signals (except ALUOp) for each of the following
More informationMark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control
EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationEECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction
EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement
More informationChapter 4. The Processor
Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationReview: Abstract Implementation View
Review: Abstract Implementation View Split memory (Harvard) model - single cycle operation Simplified to contain only the instructions: memory-reference instructions: lw, sw arithmetic-logical instructions:
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More informationCS3350B Computer Architecture Quiz 3 March 15, 2018
CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.
More informationPart II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1
Part II Instruction-Set Architecture Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1 MiniMIPS Instruction Formats op rs rt 31 25 2 15 1 5 R 6 bits 5 bits 5 bits 5 bits I J Opcode Source
More informationThe overall datapath for RT, lw,sw beq instrucution
Designing The Main Control Unit: Remember the three instruction classes {R-type, Memory, Branch}: a) R-type : Op rs rt rd shamt funct 1.src 2.src dest. 31-26 25-21 20-16 15-11 10-6 5-0 a) Memory : Op rs
More informationProcessor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4
Processor Han Wang CS3410, Spring 2012 Computer Science Cornell University See P&H Chapter 2.16 20, 4.1 4 Announcements Project 1 Available Design Document due in one week. Final Design due in three weeks.
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationEE 457 Unit 6a. Basic Pipelining Techniques
EE 47 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink bottling plant Filling the bottle = 3 sec. Placing the cap = 3 sec. Labeling = 3 sec. Would you want Machine = Does
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationCISC 662 Graduate Computer Architecture. Lecture 4 - ISA
CISC 662 Graduate Computer Architecture Lecture 4 - ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationEECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer
EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control
ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849
More informationCPE 335. Basic MIPS Architecture Part II
CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationRISC Design: Multi-Cycle Implementation
RISC Design: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationLECTURE 10. Pipelining: Advanced ILP
LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction
More informationConcocting an Instruction Set
Concocting an Instruction Set Nerd Chef at work. move flour,bowl add milk,bowl add egg,bowl move bowl,mixer rotate mixer... Read: Chapter 2.1-2.7 L03 Instruction Set 1 A General-Purpose Computer The von
More informationMajor CPU Design Steps
Datapath Major CPU Design Steps. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected
More informationCPU Organization (Design)
ISA Requirements CPU Organization (Design) Datapath Design: Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions (e.g., Registers, ALU, Shifters, Logic
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationMIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support
Components of an ISA EE 357 Unit 11 MIPS ISA 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support SUBtract instruc. vs. NEGate + ADD instrucs. 3. Registers accessible
More informationReview. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU
CS6C L9 CPU Design : Designing a Single-Cycle CPU () insteecsberkeleyedu/~cs6c CS6C : Machine Structures Lecture #9 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker Review
More informationInstruction Set Architecture part 1 (Introduction) Mehran Rezaei
Instruction Set Architecture part 1 (Introduction) Mehran Rezaei Overview Last Lecture s Review Execution Cycle Levels of Computer Languages Stored Program Computer/Instruction Execution Cycle SPIM, a
More information--------------------------------------------------------------------------------------------------------------------- 1. Objectives: Using the Logisim simulator Designing and testing a Pipelined 16-bit
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More informationThe Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
The Processor (1) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCPE 335 Computer Organization. Basic MIPS Architecture Part I
CPE 335 Computer Organization Basic MIPS Architecture Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s8/index.html CPE232 Basic MIPS Architecture
More informationLaboratory Exercise 6 Pipelined Processors 0.0
Laboratory Exercise 6 Pipelined Processors 0.0 Goals After this laboratory exercise, you should understand the basic principles of how pipelining works, including the problems of data and branch hazards
More informationMIPS Assembly Programming
COMP 212 Computer Organization & Architecture COMP 212 Fall 2008 Lecture 8 Cache & Disk System Review MIPS Assembly Programming Comp 212 Computer Org & Arch 1 Z. Li, 2008 Comp 212 Computer Org & Arch 2
More informationCharacter Is a byte quantity (00~FF or 0~255) ASCII (American Standard Code for Information Interchange) Page 91, Fig. 2.21
2.9 Communication with People: Byte Data & Constants Character Is a byte quantity (00~FF or 0~255) ASCII (American Standard Code for Information Interchange) Page 91, Fig. 2.21 32: space 33:! 34: 35: #...
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationCOMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined
More informationECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining
ECE 3056: Architecture, Concurrency, and Energy of Computation Sample Problem Sets: Pipelining 1. Consider the following code sequence used often in block copies. This produces a special form of the load
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCC 311- Computer Architecture. The Processor - Control
CC 311- Computer Architecture The Processor - Control Control Unit Functions: Instruction code Control Unit Control Signals Select operations to be performed (ALU, read/write, etc.) Control data flow (multiplexor
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationINSTRUCTION SET COMPARISONS
INSTRUCTION SET COMPARISONS MIPS SPARC MOTOROLA REGISTERS: INTEGER 32 FIXED WINDOWS 32 FIXED FP SEPARATE SEPARATE SHARED BRANCHES: CONDITION CODES NO YES NO COMPARE & BR. YES NO YES A=B COMP. & BR. YES
More informationChapter 4. The Processor
Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA
More informationLecture 2: Processor and Pipelining 1
The Simple BIG Picture! Chapter 3 Additional Slides The Processor and Pipelining CENG 6332 2 Datapath vs Control Datapath signals Control Points Controller Datapath: Storage, FU, interconnect sufficient
More informationWhat do we have so far? Multi-Cycle Datapath (Textbook Version)
What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001
More informationECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.
This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationCISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization
CISC 662 Graduate Computer Architecture Lecture 4 - ISA MIPS ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationChapter 4 The Processor
Chapter 4 The Processor 4.1 Introduction 4.2 Logic Design Conventions 4.3 The Single-Cycle Design 4.4 The Pipelined Design (c) Kevin R. Burger :: Computer Science & Engineering :: Arizona State University
More informationENE 334 Microprocessors
ENE 334 Microprocessors Lecture 6: Datapath and Control : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 3 th & 4 th Edition, Patterson & Hennessy, 2005/2008, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More informationECE 486/586. Computer Architecture. Lecture # 7
ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix
More informationDesign of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)
Microarchitecture Design of Digital Circuits 27 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_7 Adapted from Digital
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More information