CHW 362 : Computer Architecture & Organization

Size: px
Start display at page:

Download "CHW 362 : Computer Architecture & Organization"

Transcription

1 CHW 362 : Computer Architecture & Organization Instructors: Dr Ahmed Shalaby Dr Mona Ali

2 Review: Instruction Formats R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits J-Type op addr 6 bits 26 bits Chapter 7 <2>

3 R-Type Register-type 3 register operands: rs, rt: source registers rd: destination register Other fields: op: the operation code or opcode ( for R-type instructions) funct: the function with opcode, tells computer what operation to perform shamt: the shift amount for shift instructions, otherwise it s R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Chapter 7 <3>

4 R-Type Examples Assembly Code add $s, $s, $s2 sub $t, $t3, $t5 Field Values op rs rt rd shamt funct bits 5 bits 5 bits 5 bits 5 bits 6 bits Machine Code op rs rt rd shamt funct (x23282) (x6d422) 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Note the order of registers in the assembly code: add rd, rs, rt Chapter 7 <4>

5 I-Type Immediate-type 3 operands: rs, rt: register operands imm: 6-bit two s complement immediate Other fields: op: the opcode Simplicity favors regularity: all instructions have opcode Operation is completely determined by opcode I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <5>

6 I-Type Examples Assembly Code addi $s, $s, 5 addi $t, $s3, -2 lw $t2, 32($) Field Values op rs rt imm sw $s, 4($t) bits 5 bits 5 bits 6 bits Note the differing order of registers in assembly and machine codes: addi rt, rs, imm lw sw rt, imm(rs) rt, imm(rs) Machine Code op rs rt imm 6 bits 5 bits 5 bits 6 bits (x2235) (x2268fff4) (x8ca2) (xad34) Chapter 7 <6>

7 I-Type Examples PC-Relative Addressing x beq $t, $, else x4 addi $v, $, x8 addi $sp, $sp, i xc jr $ra x2 else: addi $a, $a, - x24 jal factorial Assembly Code beq $t, $, else (beq $t, $, 3) Field Values op rs rt imm bits 5 bits 5 bits 5 bits 5 bits 6 bits Chapter 7 <7>

8 J-Type Jump-type 26-bit address operand (addr) Used for jump instructions (j) op J-Type addr 6 bits 26 bits Chapter 7 <8>

9 J-Type Example Pseudo-direct Addressing x45c jal sum... x4a sum: add $v, $a, $a JTA 26-bit addr 2 8 (x4a) (x28) Field Values Machine Code op imm op addr 3 x28 6 bits 26 bits 6 bits 26 bits (xc28) Chapter 7 <9>

10 Chapter 7 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 7 <>

11 Chapter 7 :: Topics Introduction Performance Analysis Single-Cycle Processor Multicycle Processor Pipelined Processor Exceptions Advanced Microarchitecture Chapter 7 <>

12 Introduction Microarchitecture: how to implement an architecture in hardware Processor: Datapath: functional blocks Control: control signals Application Software Operating Systems Architecture Microarchitecture Logic Digital Circuits Analog Circuits Devices programs device drivers instructions registers datapaths controllers adders memories AND gates NOT gates amplifiers filters transistors diodes Physics electrons Chapter 7 <2>

13 Microarchitecture Multiple implementations for a single architecture: Single-cycle: Each instruction executes in a single cycle Multicycle: Each instruction is broken into series of shorter steps Pipelined: Each instruction broken up into series of steps & multiple instructions execute at once Chapter 7 <3>

14 Processor Performance Program execution time Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) Definitions: CPI: Cycles/instruction clock period: seconds/cycle IPC: instructions/cycle = IPC Challenge is to satisfy constraints of: Cost Power Performance Chapter 7 <4>

15 MIPS Processor Consider subset of MIPS instructions: R-type instructions: and, or, add, sub, slt instructions: lw, sw Branch instructions: beq Chapter 7 <5>

16 Architectural State Determines everything about a processor: PC 32 registers Chapter 7 <6>

17 MIPS State Elements PC' PC A RD Instruction A A2 A3 WD3 WE3 Register File RD RD A RD Data WD WE 32 Chapter 7 <7>

18 Single-Cycle MIPS Processor Datapath Control Chapter 7 <8>

19 Single-Cycle Datapath: lw fetch STEP : Fetch instruction PC' PC A RD Instruction Instr A A2 A3 WD3 WE3 Register File RD RD2 A RD Data WD WE lw rt, imm(rs) I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <9>

20 Single-Cycle Datapath: lw Register Read STEP 2: Read source operands from PC' PC A RD Instruction Instr 25:2 A A2 A3 WD3 WE3 Register File RD RD2 A RD Data WD WE lw rt, imm(rs) I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <2>

21 Single-Cycle Datapath: lw Immediate STEP 3: Sign-extend the immediate PC' PC A RD Instruction Instr 25:2 A A2 A3 WD3 WE3 Register File RD RD2 A RD Data WD WE 5: Sign Extend SignImm I-Type lw rt, imm(rs) op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <2>

22 Single-Cycle Datapath: lw address STEP 4: Compute the memory address ALUControl 2: PC' PC A RD Instruction Instr 25:2 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult A RD Data WD WE 5: Sign Extend SignImm I-Type lw rt, imm(rs) op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <22>

23 STEP 5: Read data from memory and write it back to register file PC' Single-Cycle Datapath: lw Read PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 RegWrite WE3 Register File RD RD2 ALUControl 2: SrcA SrcB ALU Zero ALUResult A RD Data WD WE ReadData 5: Sign Extend SignImm lw rt, imm(rs) I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <23>

24 Single-Cycle Datapath: lw PC Increment STEP 6: Determine address of next instruction PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 RegWrite WE3 Register File RD RD2 ALUControl 2: SrcA SrcB ALU Zero ALUResult A RD Data WD WE ReadData 4 + PCPlus4 5: Sign Extend SignImm Result Chapter 7 <24>

25 Single-Cycle Datapath: sw Write data in rt to memory PC' PC A RD Instruction Instr 25:2 2:6 2:6 A A2 A3 WD3 RegWrite WE3 Register File RD RD2 ALUControl 2: SrcA SrcB ALU Zero ALUResult WriteData A MemWrite RD Data WD WE ReadData 4 + PCPlus4 5: Sign Extend SignImm Result lw rt, imm(rs) I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <25>

26 Single-Cycle Datapath: R-Type Read from rs and rt Write ALUResult to register file Write to rd (instead of rt) R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits PC' PC A RD Instruction Instr 25:2 2:6 RegWrite RegDst ALUSrc ALUControl 2: MemWrite MemtoReg varies A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm Result Chapter 7 <26>

27 Single-Cycle Datapath: beq Determine whether values in rs and rt are equal Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4) PCSrc PC' PC A RD Instruction Instr 25:2 2:6 RegWrite RegDst ALUSrc ALUControl 2: Branch MemWrite MemtoReg x x A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch Assembly Code beq $t, $, else (beq $t, $, 3) Field Values op rs rt imm bits 5 bits 5 bits 5 bits 5 bits 6 bits Chapter 7 <27> Result

28 Chapter 7 <28> SignImm A RD Instruction + 4 A A3 WD3 RD2 RD WE3 A2 Sign Extend Register File A RD Data WD WE PC PC' Instr 25:2 2:6 5: 5: SrcB 2:6 5: <<2 + ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg 4: Result 3:26 RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit Zero PCSrc ALUControl 2: ALU Single-Cycle Processor

29 Single-Cycle Control Control Unit Opcode 5: Main Decoder MemtoReg MemWrite Branch ALUSrc RegDst RegWrite ALUOp : Funct 5: ALU Decoder ALUControl 2: Chapter 7 <29>

30 Review: ALU A N ALU N Y B N 3 F F 2: Function A & B A B A + B not used A & ~B A ~B A - B SLT Chapter 7 <3>

31 Zero Extend 2 3 Review: ALU A N B N F 2: Function A & B N A B N F 2 A + B not used C out + [N-] S A & ~B A ~B A - B N N N N SLT 2 F : Y N Chapter 7 <3>

32 Control Unit: ALU Decoder ALUOp : Meaning Add Subtract Look at Funct Not Used R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits ALUOp : Funct ALUControl 2: X (Add) X X (Subtract) X (add) (Add) X (sub) (Subtract) X (and) (And) X (or) (Or) X (slt) (SLT) Chapter 7 <32>

33 Control Unit Main Decoder Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : R-type lw sw beq 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC 4 A RD Instruction + PCPlus4 Instr 25:2 2:6 2:6 5: 5: A A2 A3 WD3 WE3 Register File RD RD2 WriteReg 4: Sign Extend SignImm SrcA SrcB <<2 ALU + Zero ALUResult WriteData PCBranch WE A RD Data WD ReadData Chapter 7 <33> Result

34 Control Unit: Main Decoder Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : R-type lw sw X X beq X X Chapter 7 <34>

35 Single-Cycle Datapath: or 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op Funct ALUSrc RegDst PCSrc RegWrite PC' PC 4 A RD Instruction + PCPlus4 Instr 25:2 2:6 2:6 5: 5: A A2 A3 WD3 WE3 Register File RD RD2 WriteReg 4: Sign Extend SignImm SrcA SrcB <<2 Zero ALU + ALUResult WriteData PCBranch A RD Data WD WE ReadData Result R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Chapter 7 <35>

36 Extended Functionality: addi 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op Funct ALUSrc RegDst PCSrc RegWrite PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch No change to datapath Result Chapter 7 <36>

37 Control Unit: addi Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : R-type lw sw X X beq X X addi Chapter 7 <37>

38 Control Unit: addi Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : R-type lw sw X X beq X X addi Chapter 7 <38>

39 Extended Functionality: j Jump 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData Result PCJump 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch 27: 3:28 25: <<2 J-Type op addr 6 bits 26 bits Chapter 7 <39>

40 Control Unit: Main Decoder Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : Jump R-type lw sw X X beq X X j Chapter 7 <4>

41 Control Unit: Main Decoder Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : Jump R-type lw sw X X beq X X j X X X X XX Chapter 7 <4>

42 Review: Processor Performance Program Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x T C Chapter 7 <42>

43 Single-Cycle Performance 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC 4 A RD Instruction + PCPlus4 Instr 25:2 2:6 2:6 5: 5: A A2 A3 WD3 WE3 Register File RD RD2 WriteReg 4: Sign Extend SignImm SrcA SrcB <<2 Zero ALU + ALUResult WriteData PCBranch A RD Data WD WE ReadData Result T C limited by critical path (lw) Chapter 7 <43>

44 Single-Cycle Performance Single-cycle critical path: T c = t pcq_pc + t mem + max(t read, t sext + t mux ) + t ALU + t mem + t mux + t setup Typically, limiting paths are: memory, ALU, register file T c = t pcq_pc + 2t mem + t read + t mux + t ALU + t setup Chapter 7 <44>

45 Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 3 Register setup t setup 2 Multiplexer t mux 25 ALU t ALU 2 read t mem 25 Register file read t read 5 Register file setup t setup 2 T c =? Chapter 7 <45>

46 Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 3 Register setup t setup 2 Multiplexer t mux 25 ALU t ALU 2 read t mem 25 Register file read t read 5 Register file setup t setup 2 T c = t pcq_pc + 2t mem + t read + t mux + t ALU + t setup = [3 + 2(25) ] ps = 925 ps Chapter 7 <46>

47 Single-Cycle Performance Example Program with billion instructions: Execution Time = # instructions x CPI x T C = ( 9 )()(925-2 s) = 92.5 seconds Chapter 7 <47>

48 Multicycle MIPS Processor Single-cycle: + simple - cycle time limited by longest instruction (lw) - 2 adders/alus & 2 memories Multicycle: + higher clock speed + simpler instructions run faster + reuse expensive hardware on multiple cycles - sequencing overhead paid many times Same design steps: datapath & control Chapter 7 <48>

49 CHW 362 : Computer Architecture & Organization Instructors: Dr Ahmed Shalaby Dr Mona Ali

50 CHW 362 : Computer Architecture & Organization ما هو رأيك فيما يلى :. المحتوى العلمى للكورس موضحا العيوب من وجهة نظرك ثم اذكرنصائح لتحسين العيوب وتطوير الكورس للسنة القادمة. 2. المحاضر طريقة التدريس العيوب بالمحاضرة ثم اذكر نصائح للتطوير. 3. العملى و الهيئة المعاونة العيوب نصائح للتطوير.

51 Review: Single-Cycle Processor Jump 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData Result PCJump 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch 27: 3:28 25: <<2 Chapter 7 <5>

52 Pipelined MIPS Processor Temporal parallelism Divide single-cycle processor into 5 stages: Fetch Decode Execute Writeback Add pipeline registers between stages Chapter 7 <52>

53 MemtoReg RegDst Main Controller FSM: Fetch Reset S: Fetch IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <53>

54 MemtoReg RegDst Main Controller FSM: Fetch S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <54>

55 MemtoReg RegDst Main Controller FSM: Decode S: Fetch S: Decode IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' X WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B X SrcA XXX Zero X XX ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <55>

56 MemtoReg RegDst Main Controller FSM: Address S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' X WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero X ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <56>

57 MemtoReg RegDst Main Controller FSM: Address S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' X WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero X ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <57>

58 Main Controller FSM: lw S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead IorD = S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <58>

59 Main Controller FSM: sw S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite IorD = IorD = MemWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <59>

60 Main Controller FSM: R-Type S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <6>

61 Main Controller FSM: beq S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <6>

62 Extended Functionality: addi S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S9: ADDI Execute Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <62>

63 Main Controller FSM: addi S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S9: ADDI Execute ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <63>

64 Main Controller FSM: j S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = J Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S: Jump S9: ADDI Execute ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <64>

65 Main Controller FSM: j S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = J Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S: Jump S9: ADDI Execute PCSrc = PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <65>

66 Review: Single-Cycle Processor Jump 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData Result PCJump 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch 27: 3:28 25: <<2 Chapter 7 <66>

67 Single-Cycle vs. Pipelined Single-Cycle Instr Fetch Instruction Decode Read Reg Execute ALU Read / Write Write Reg Fetch Instruction Decode Read Reg Execute ALU Read / Write Time (ps) Write Reg Pipelined Instr Fetch Instruction Decode Read Reg Execute ALU Read/Write Write Reg 2 Fetch Instruction Decode Read Reg Execute ALU Read/Write Write Reg 3 Fetch Instruction Decode Read Reg Execute ALU Read/Write Write Reg Chapter 7 <67>

68 Pipelined Processor Abstraction Time (cycles) lw $ lw $s2, 4($) IM 4 + DM $s2 add $s3, $t, $t2 IM add $t $t2 + DM $s3 sub $s4, $s, $s5 IM sub $s $s5 - DM $s4 and $s5, $t5, $t6 IM and $t5 $t6 & DM $s5 sw $s6, 2($s) IM sw $s 2 + DM $s6 or $s7, $t3, $t4 IM or $t3 $t4 DM $s7 Chapter 7 <68>

69 Single-Cycle & Pipelined Datapath PC' PC 4 A RD Instruction + PCPlus4 Instr 25:2 2:6 2:6 5: 5: A A2 A3 WD3 WE3 Register File RD RD2 Sign Extend SignImm PC' PCF A RD Instruction InstrD 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 RtE RdE SrcAE SrcBE WriteDataE WriteRegE 4: ZeroM ALUOutM WriteDataM WE A RD Data WD ALUOutW ReadDataW + 4 5: Sign Extend SignImmE <<2 PCBranchM + SrcA SrcB WriteReg 4: <<2 ALU + Zero ALUResult WriteData PCBranch WE A RD Data WD ReadData Result ALU PCPlus4F PCPlus4D PCPlus4E Fetch Decode Execute Writeback ResultW Chapter 7 <69>

70 Corrected Pipelined Datapath ALUOutW PC' PCF A RD Instruction InstrD 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: A RD Data WD WE ReadDataW WriteRegW 4: 4 + 5: Sign Extend SignImmE <<2 + PCBranchM PCPlus4F PCPlus4D PCPlus4E ResultW Fetch Decode Execute Writeback WriteReg must arrive at same time as Result Chapter 7 <7>

71 Pipelined Processor Control Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Funct BranchD ALUControlD ALUSrcD BranchE ALUControlE 2: ALUSrcE BranchM PCSrcM RegDstD RegDstE ALUOutW PC' PCF A RD Instruction InstrD 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 RtE RdE SrcAE SrcBE WriteDataE ALU WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: A RD Data WD WE ReadDataW WriteRegW 4: 4 + 5: Sign Extend SignImmE <<2 + PCBranchM PCPlus4F PCPlus4D PCPlus4E Same control unit as single-cycle processor Control delayed to proper pipeline stage Chapter 7 <7> ResultW

72 Pipeline Hazards When an instruction depends on result from instruction that hasn t completed Types: Data hazard: register value not yet written back to register file Control hazard: next instruction not decided yet (caused by branches) Chapter 7 <72>

73 Data Hazard Time (cycles) add $s2 add $s, $s2, $s3 IM $s3 + DM $s and $t, $s, $s IM and $s $s & DM $t or $t, $s4, $s IM or $s4 $s DM $t sub $t2, $s, $s5 IM sub $s $s5 - DM $t2 Chapter 7 <73>

74 Handling Data Hazards Insert nops in code at compile time Rearrange code at compile time Forward data at run time Stall the processor at run time Chapter 7 <74>

75 Compile-Time Hazard Elimination Insert enough nops for result to be ready Or move independent useful instructions forward Time (cycles) add $s2 add $s, $s2, $s3 IM $s3 + DM $s nop IM nop DM nop IM nop DM and $t, $s, $s IM and $s $s & DM $t or $t, $s4, $s IM or $s4 $s DM $t sub $t2, $s, $s5 IM sub $s $s5 - DM $t2 Chapter 7 <75>

76 Data Forwarding Time (cycles) add $s2 add $s, $s2, $s3 IM $s3 + DM $s and $t, $s, $s IM and $s $s & DM $t or $t, $s4, $s IM or $s4 $s DM $t sub $t2, $s, $s5 IM sub $s $s5 - DM $t2 Chapter 7 <76>

77 Data Forwarding Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Funct ALUControlD 2: ALUSrcD RegDstD BranchD ALUControlE 2: ALUSrcE RegDstE BranchE BranchM PCSrcM PC' PCF 4 A RD Instruction + InstrD 25:2 2:6 25:2 2:6 5: 5: A A2 A3 WD3 WE3 Sign Extend Register File RD RD2 SignImmD RsD RtD RdD RsE RtE RdE SrcBE SignImmE <<2 SrcAE WriteDataE ALU WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: + PCPlus4F PCPlus4D PCPlus4E PCBranchM ResultW ForwardAE ForwardBE RegWriteM RegWriteW Hazard Unit Chapter 7 <77>

78 Stalling Time (cycles) lw $s, 4($) IM 4 lw $ + DM $s and $t, $s, $s IM and Trouble! $s $s & DM $t or $t, $s4, $s IM or $s4 $s DM $t sub $t2, $s, $s5 IM sub $s $s5 - DM $t2 Chapter 7 <78>

79 Stalling Time (cycles) lw $s, 4($) IM 4 lw $ + DM $s and $t, $s, $s IM and $s $s $s $s & DM $t or $t, $s4, $s IM or IM or $s4 $s DM $t sub $t2, $s, $s5 Stall IM sub $s $s5 - DM $t2 Chapter 7 <79>

80 Stalling Hardware Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Funct ALUControlD 2: ALUSrcD RegDstD BranchD ALUControlE 2: ALUSrcE RegDstE BranchE BranchM PCSrcM PC' EN PCF 4 A RD Instruction + InstrD 25:2 2:6 25:2 2:6 5: 5: A A2 A3 WD3 WE3 Sign Extend Register File RD RD2 SignImmD RsD RtD RdD RsE RtE RdE SrcAE SrcBE WriteDataE ALU WriteRegE 4: SignImmE <<2 ZeroM ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: + PCPlus4F EN PCPlus4D PCPlus4E PCBranchM ResultW StallF StallD FlushE ForwardAE ForwardBE MemtoRegE RegWriteM RegWriteW CLR Hazard Unit Chapter 7 <8>

81 Control Hazards beq: branch not determined until 4 th stage of pipeline Instructions after branch fetched before branch occurs These instructions must be flushed if branch happens Branch misprediction penalty number of instruction flushed when branch is taken May be reduced by determining branch earlier Chapter 7 <8>

82 EN CLR EN Control Hazards: Original Pipeline Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Funct ALUControlD 2: ALUSrcD RegDstD BranchD ALUControlE 2: ALUSrcE RegDstE BranchE BranchM PCSrcM PC' PCF A RD Instruction InstrD 25:2 2:6 25:2 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 RsD RtD RdD RsE RtE RdE SrcAE SrcBE WriteDataE ALU WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: 4 + 5: Sign Extend SignImmD SignImmE <<2 + PCPlus4F PCPlus4D PCPlus4E PCBranchM ResultW StallF StallD FlushE ForwardAE ForwardBE MemtoRegE RegWriteM RegWriteW Hazard Unit Chapter 7 <82>

83 slt Control Hazards Time (cycles) 2 $t beq $t, $t2, 4 IM $t2 lw - DM 24 and $t, $s, $s IM and $s $s & DM 28 or $t, $s4, $s IM or $s4 $s DM Flush these instructions 2C sub $t2, $s, $s5 IM sub $s $s5 - DM slt $t3, $s2, $s3 IM $s3 slt $s2 DM $t3 Chapter 7 <83>

84 EN CLR CLR EN Early Branch Resolution Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM ALUControlD 2: ALUControlE 2: 3:26 5: Op Funct ALUSrcD RegDstD ALUSrcE RegDstE BranchD PC' PCF A RD Instruction InstrD 25:2 2:6 25:2 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 EqualD PCSrcD = RsD RtD RdE RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: 4 + 5: Sign Extend SignImmD <<2 SignImmE + PCPlus4F PCPlus4D PCBranchD ResultW StallF StallD FlushE ForwardAE ForwardBE MemtoRegE RegWriteM RegWriteW Hazard Unit Introduced another data hazard in Decode stage Chapter 7 <84>

85 slt Early Branch Resolution Time (cycles) 2 $t beq $t, $t2, 4 IM $t2 lw - DM 24 and $t, $s, $s IM and $s $s & DM Flush this instruction 28 or $t, $s4, $s 2C sub $t2, $s, $s slt $t3, $s2, $s3 IM $s3 slt $s2 DM $t3 Chapter 7 <85>

86 EN CLR CLR EN Handling Data & Control Hazards Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM ALUControlD 2: ALUControlE 2: 3:26 5: Op Funct ALUSrcD RegDstD ALUSrcE RegDstE BranchD PC' PCF A RD Instruction InstrD 25:2 2:6 25:2 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 EqualD PCSrcD = RsD RtD RdD RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: 4 + 5: Sign Extend SignImmD <<2 SignImmE + PCPlus4F PCPlus4D PCBranchD ResultW StallF StallD BranchD ForwardAD ForwardBD FlushE ForwardAE ForwardBE MemtoRegE RegWriteE RegWriteM RegWriteW Hazard Unit Chapter 7 <86>

87 Branch Prediction Guess whether branch will be taken Backward branches are usually taken (loops) Consider history to improve guess Good prediction reduces fraction of branches requiring a flush Chapter 7 <87>

88 Pipelined Performance Example SPECINT2 benchmark: 25% loads % stores % branches 2% jumps 52% R-type Suppose: 4% of loads used by next instruction 25% of branches mispredicted All jumps flush next instruction What is the average CPI? Chapter 7 <88>

89 Pipelined Performance Example SPECINT2 benchmark: 25% loads % stores % branches 2% jumps 52% R-type Suppose: 4% of loads used by next instruction 25% of branches mispredicted All jumps flush next instruction What is the average CPI? Load/Branch CPI = when no stalling, 2 when stalling CPI lw = (.6) + 2(.4) =.4 CPI beq = (.75) + 2(.25) =.25 Average CPI = (.25)(.4) + (.)() + (.)(.25) + (.2)(2) + (.52)() =.5 Chapter 7 <89>

90 Pipelined Performance Pipelined processor critical path: T c = max { t pcq + t mem + t setup 2(t read + t mux + t eq + t AND + t mux + t setup ) t pcq + t mux + t mux + t ALU + t setup t pcq + t memwrite + t setup 2(t pcq + t mux + t write ) } Chapter 7 <9>

91 Pipelined Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 3 Register setup t setup 2 Multiplexer t mux 25 ALU t ALU 2 read t mem 25 Register file read t read 5 Register file setup t setup 2 Equality comparator t eq 4 AND gate t AND 5 write t memwrite 22 Register file write t write T c = 2(t read + t mux + t eq + t AND + t mux + t setup ) = 2[ ] ps = 55 ps Chapter 7 <9>

92 Pipelined Performance Example Program with billion instructions Execution Time = (# instructions) CPI T c = ( 9 )(.5)(55-2 ) = 63 seconds Chapter 7 <92>

93 Processor Performance Comparison Processor Execution Time (seconds) Single-cycle 92.5 Multicycle 33.7 Pipelined Speedup (single-cycle as baseline) Chapter 7 <93>

94 Review: Exceptions Unscheduled function call to exception handler Caused by: Hardware, also called an interrupt, e.g. keyboard Software, also called traps, e.g. undefined instruction When exception occurs, the processor: Records cause of exception (Cause register) Jumps to exception handler (x88) Returns to program (EPC register) Chapter 7 <94>

95 Example Exception Chapter 7 <95>

96 Exception Registers Not part of register file Cause Records cause of exception Coprocessor register 3 EPC (Exception PC) Records PC where exception occurred Coprocessor register 4 Move from Coprocessor mfc $t, Cause Moves contents of Cause into $t mfc $t (8) Cause (3) 3:26 25:2 2:6 5: : Chapter 7 <96>

97 Exception Causes Exception Hardware Interrupt System Call Breakpoint / Divide by Undefined Instruction Arithmetic Overflow Cause x x2 x24 x28 x3 Extend multicycle MIPS processor to handle last two types of exceptions Chapter 7 <97>

98 Exception Hardware: EPC & Cause EPCWrite IntCause CauseWrite x3 x28 EN Cause EPC EN PCEn IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB : ALUControl 2: BranchPCWrite PCSrc : PC' PC EN Adr A RD Instr / Data WD WE Instr EN Data 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 A B 3:28 4 <<2 SrcA SrcB ALU Zero ALUResult ALUOut Overflow PCJump <<2 x8 8 27: SignImm 5: Sign Extend 25: (jump) Chapter 7 <98>

99 Control FSM with Exceptions S2: Undefined PCSrc = PCWrite IntCause = CauseWrite EPCWrite S4: MFC RegDst = Memtoreg = RegWrite Op = others S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = mfc Op = BEQ Op = J Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S: Jump S9: ADDI Execute PCSrc = PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead IorD = Op = SW S5: MemWrite IorD = MemWrite S7: ALU Writeback Overflow RegDst = MemtoReg = RegWrite Overflow S3: Overflow PCSrc = PCWrite IntCause = CauseWrite EPCWrite S: ADDI Writeback RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <99>

100 Exception Hardware: mfc EPCWrite IntCause x3 x28 EN CauseWrite Cause EN EPC... C... 5: IorD MemWrite IRWrite RegDst MemtoReg : RegWrite ALUSrcA PCEn ALUSrcB : ALUControl 2: BranchPCWrite PCSrc : PC' PC EN Adr A RD Instr / Data WD WE Instr EN Data 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 A B 3:28 4 <<2 SrcA SrcB ALU Zero ALUResult ALUOut Overflow PCJump <<2 x8 8 27: 5: Sign Extend SignImm 25: (jump) Chapter 7 <>

101 Advanced Microarchitecture Deep Pipelining Branch Prediction Superscalar Processors Out of Order Processors Register Renaming SIMD Multithreading Multiprocessors Chapter 7 <>

102 Deep Pipelining -2 stages typical Number of stages limited by: Pipeline hazards Sequencing overhead Power Cost Chapter 7 <2>

103 Branch Prediction Ideal pipelined processor: CPI = Branch misprediction increases CPI Static branch prediction: Check direction of branch (forward or backward) If backward, predict taken Else, predict not taken Dynamic branch prediction: Keep history of last (several hundred) branches in branch target buffer, record: Branch destination Whether branch was taken Chapter 7 <3>

104 Branch Prediction Example add $s, $, $ # sum = add $s, $, $ # i = addi $t, $, # $t = for: beq $s, $t, done # if i ==, branch add $s, $s, $s # sum = sum + i addi $s, $s, # increment i j for done: Chapter 7 <4>

105 -Bit Branch Predictor Remembers whether branch was taken the last time and does the same thing Mispredicts first and last branch of loop Chapter 7 <5>

106 2-Bit Branch Predictor strongly taken taken predict taken weakly taken weakly not taken taken predict taken predict taken taken taken taken not taken taken predict not taken strongly not taken taken Only mispredicts last branch of loop Chapter 7 <6>

107 Superscalar Multiple copies of datapath execute multiple instructions at once Dependencies make it tricky to issue multiple instructions at once PC A RD Instruction A A2 A3 A4 A5 A6 WD3 WD6 Register File RD RD4 RD2 RD5 ALUs A RD A2 RD2 Data WD WD2 Chapter 7 <7>

108 + Superscalar Example lw $t, 4($s) add $t, $t, $s sub $t, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t Actual IPC: 2 or $t3, $s5, $s6 sw $s7, 8($t3) lw $t, 4($s) add $t, $s, $s2 IM lw add $s 4 $s $s2 + + DM $t $t Time (cycles) sub $t2, $s, $s3 and $t3, $s3, $s4 IM $s sub $t2 $s3 - DM $s3 and $t3 $s4 & or $t4, $s, $s5 sw $s5, 8($s) IM or sw $s $s5 $s 8 DM $s5 $t4 Chapter 7 <8>

109 Superscalar with Dependencies lw $t, 4($s) add $t, $t, $s sub $t, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t Actual IPC: 6/5 =.2 or $t3, $s5, $s6 sw $s7, 8($t3) Time (cycles) lw $t, 4($s) IM lw $s 4 + DM $t add $t, $t, $s sub $t, $s2, $s3 IM add sub $t $s $s2 $s3 $t $s $s2 $s3 + - DM $t $t and $t2, $s4, $t or $t3, $s5, $s6 Stall and IM or IM and or $s4 $t $s5 $s6 & DM $t2 $t3 sw $s7, 8($t3) IM sw $t3 8 + $s7 DM Chapter 7 <9>

110 Out of Order Processor Looks ahead across multiple instructions Issues as many instructions as possible at once Issues instructions out of order (as long as no dependencies) Dependencies: RAW (read after write): one instruction writes, later instruction reads a register WAR (write after read): one instruction reads, later instruction writes a register WAW (write after write): one instruction writes, later instruction writes a register Chapter 7 <>

111 Out of Order Processor Instruction level parallelism (ILP): number of instruction that can be issued simultaneously (average < 3) Scoreboard: table that keeps track of: Instructions waiting to issue Available functional units Dependencies Chapter 7 <>

112 Out of Order Processor Example lw $t, 4($s) add $t, $t, $s sub $t, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t Actual IPC: 6/4 =.5 or $t3, $s5, $s6 sw $s7, 8($t3) lw $t, 4($s) or $t3, $s5, $s6 RAW sw $s7, 8($t3) two cycle latency between load and RAW use of $t add $t, $t, $s WAR sub $t, $s2, $s3 RAW and $t2, $s4, $t IM lw or $s 4 $s5 $s6 + DM $t $t3 $t3 sw $s7 8 + DM IM IM add sub IM $t $s $s2 $s3 and + - $s4 $t & DM $t $t DM Time (cycles) $t2 Chapter 7 <2>

113 Register Renaming lw $t, 4($s) add $t, $t, $s sub $t, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t Actual IPC: 6/3 = 2 or $t3, $s5, $s6 sw $s7, 8($t3) Time (cycles) lw $t, 4($s) sub $r, $s2, $s3 IM lw sub $s 4 $s2 $s3 + - DM $t $r 2-cycle RAW RAW and $t2, $s4, $r or $t3, $s5, $s6 IM and or $s4 $r $s5 $s6 & DM $t2 $t3 RAW add $t, $t, $s sw $s7, 8($t3) IM add sw $t $s $t DM $s7 $t Chapter 7 <3>

114 SIMD Single Instruction Multiple Data (SIMD) Single instruction acts on multiple pieces of data at once Common application: graphics Perform short arithmetic operations (also called packed arithmetic) For example, add four 8-bit elements padd8 $s2, $s, $s Bit position a 3 a 2 a a $s + b 3 b 2 b b $s a 3 + b 3 a 2 + b 2 a + b a + b $s2 Chapter 7 <4>

115 Advanced Architecture Techniques Multithreading Wordprocessor: thread for typing, spell checking, printing Multiprocessors Multiple processors (cores) on a single chip Chapter 7 <5>

116 Threading: Definitions Process: program running on a computer Multiple processes can run at once: e.g., surfing Web, playing music, writing a paper Thread: part of a program Each process has multiple threads: e.g., a word processor may have threads for typing, spell checking, printing Chapter 7 <6>

117 Threads in Conventional Processor One thread runs at once When one thread stalls (for example, waiting for memory): Architectural state of that thread stored Architectural state of waiting thread loaded into processor and it runs Called context switching Appears to user like all threads running simultaneously Chapter 7 <7>

118 Multithreading Multiple copies of architectural state Multiple threads active at once: When one thread stalls, another runs immediately If one thread can t keep all execution units busy, another thread can use them Does not increase instruction-level parallelism (ILP) of single thread, but increases throughput Intel calls this hyperthreading Chapter 7 <8>

119 Multiprocessors Multiple processors (cores) with a method of communication between them Types: Homogeneous: multiple cores with shared memory Heterogeneous: separate cores for different tasks (for example, DSP and CPU in cell phone) Clusters: each core has own memory system Chapter 7 <9>

120 Other Resources Patterson & Hennessy s: Computer Architecture: A Quantitative Approach Conferences: ISCA (International Symposium on Computer Architecture) HPCA (International Symposium on High Performance Computer Architecture) Chapter 7 <2>

121 Study: CHW 362 : Computer Architecture & Organization Why? How? What?

122 What? Computer Architecture computer architecture defines how to command a processor. computer architecture is a set of rules and methods that describe the functionality, organization, and implementation of computer system.

123 How? Course Book

124 How? Course Content Lec # Subject Week # Lec Chapter : From Zero to One Week # Lec2 Chapter 2: Combinational Logic Design Week #2 Lec 3 Chapter 3: Sequential Logic Design Week #3 Lec 4 Chapter 3 : continue Week #4 Lec 5 Chapter 4: Hardware Description Language Week #5 Lec 6 Chapter 4 : continue Week #6 Midterm Exam Week #7 Lec 7 Chapter 5: Digital Building Blocks Week #8 Lec 8 Chapter 5 continue Week #9 Lec 9 Chapter 6: Computer Architecture Week # Lec Chapter 6 : continue Week # Lec Chapter 7: Microarchitecture Week #2 Lec 2 Chapter 7 continue Week #3

125 Assessment Final-Term Examination 65 Mid-Term Examination Practical Examination 5 Oral Examination Chapter 8 is a self study - In Exam.

126 Exam Topics Timing of Sequential Logic Section 3.5 Multi Cycle Processor Section 8.4, 8.5, 8.6 Chapter 7 <26>

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues. Lecture 2: Pipelining Topics Introduction to pipelining Performance Pipelined datapath Design issues Hazards in pipeline Types Solutions Pipelining is Natural! Laundry Example Use case scenario Ann, Brian,

More information

Computer Architectures

Computer Architectures Computer Architectures Pipelined instruction execution Hazards, stages balancing, super-scalar systems Pavel Píša, Michal Štepanovský, Miroslav Šnorek Main source of inspiration: Patterson Czech Technical

More information

Design of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 17: Pipelining Issues Prof. Onur Mutlu ETH Zurich Spring 2017 28 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) Microarchitecture Design of Digital Circuits 27 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_7 Adapted from Digital

More information

Chapter 7. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 7 <1>

Chapter 7. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 7 <1> Chapter 7 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 7 Chapter 7 :: Topics Introduction (done) Performance Analysis (done) Single-Cycle Processor

More information

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design ENGN64: Design of Computing Systems Topic 4: Single-Cycle Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 16: Dependence Handling Prof. Onur Mutlu ETH Zurich Spring 2017 27 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Slide Set 7 for Lecture Section 01

Slide Set 7 for Lecture Section 01 Slide Set 7 for Lecture Section 01 for ENCM 369 Winter 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2017 ENCM 369 Winter

More information

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement

More information

ENCM 501 Winter 2019 Assignment 6 for the Week of March 11

ENCM 501 Winter 2019 Assignment 6 for the Week of March 11 page of 8 ENCM 5 Winter 29 Assignment 6 for the Week of March Steve Norman Department of Electrical & Computer Engineering University of Calgary February 29 Assignment instructions and other documents

More information

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM

More information

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007 Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007 Final Exam This is a closed-book take-home exam. You are permitted a calculator and two 8.5x sheets of paper with notes. The exam

More information

Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam

Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Prof. Martha A. Kim December 7, 23 Name: First Last (Family) UNI (e.g., mak29) You are allowed 3 hours. You may consult your own

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

RISC Processor Design

RISC Processor Design RISC Processor Design Single Cycle Implementation - MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 13 SE-273: Processor Design Feb 07, 2011 SE-273@SERC 1 Courtesy:

More information

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 3: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring 27 6 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

CENG 5133 Computer Architecture Design Spring Sample Exam 2

CENG 5133 Computer Architecture Design Spring Sample Exam 2 CENG 533 Computer Architecture Design Spring 24 Sample Exam 2. (6 pt) Determine the propagation delay and contamination delay of the following circuit using the gate delays given below. Gate t pd (ps)

More information

ENE 334 Microprocessors

ENE 334 Microprocessors ENE 334 Microprocessors Lecture 6: Datapath and Control : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 3 th & 4 th Edition, Patterson & Hennessy, 2005/2008, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control معماري كامپيوتر پيشرفته معماري MIPS data path and control abbasi@basu.ac.ir Topics Building a datapath support a subset of the MIPS-I instruction-set A single cycle processor datapath all instruction actions

More information

Design of A Six-stage Pipelined MIPS Processor Based on FPGA

Design of A Six-stage Pipelined MIPS Processor Based on FPGA Design of A Six-stage Pipelined MIPS Processor Based on FPGA Qiao-Zhi Sun, De-Chun Kong, Cheng-Long Zhao, and Hui-Bin Shi Department of Computer Science and Technology, Nanjing University of Aeronautics

More information

CPE 335. Basic MIPS Architecture Part II

CPE 335. Basic MIPS Architecture Part II CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture

More information

RISC Design: Multi-Cycle Implementation

RISC Design: Multi-Cycle Implementation RISC Design: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

RISC Architecture: Multi-Cycle Implementation

RISC Architecture: Multi-Cycle Implementation RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

More information

RISC Architecture: Multi-Cycle Implementation

RISC Architecture: Multi-Cycle Implementation RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

More information

Systems Architecture I

Systems Architecture I Systems Architecture I Topics A Simple Implementation of MIPS * A Multicycle Implementation of MIPS ** *This lecture was derived from material in the text (sec. 5.1-5.3). **This lecture was derived from

More information

Digital Design and Computer Architecture

Digital Design and Computer Architecture Digital Design and Computer Architecture Lab 0: Multicycle Processor (Part ) Introduction In this lab and the next, you will design and build your own multicycle MIPS processor. You will be much more on

More information

Systems Architecture

Systems Architecture Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

CSE 2021 COMPUTER ORGANIZATION

CSE 2021 COMPUTER ORGANIZATION CSE 22 COMPUTER ORGANIZATION HUGH CHESSER CHESSER HUGH CSEB 2U 2U CSEB Agenda Topics:. Sample Exam/Quiz Q - Review 2. Multiple cycle implementation Patterson: Section 4.5 Reminder: Quiz #2 Next Wednesday

More information

Digital Design and Computer Architecture Harris and Harris

Digital Design and Computer Architecture Harris and Harris Digital Design and Computer Architecture Harris and Harris Lab 0: Multicycle Processor (Part ) Introduction In this lab and the next, you will design and build your own multicycle MIPS processor. You will

More information

ENCM 369 Winter 2018 Lab 9 for the Week of March 19

ENCM 369 Winter 2018 Lab 9 for the Week of March 19 page 1 of 9 ENCM 369 Winter 2018 Lab 9 for the Week of March 19 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2018 Lab instructions and other documents for ENCM

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

Processor: Multi- Cycle Datapath & Control

Processor: Multi- Cycle Datapath & Control Processor: Multi- Cycle Datapath & Control (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann, 27) COURSE

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Topic #6. Processor Design

Topic #6. Processor Design Topic #6 Processor Design Major Goals! To present the single-cycle implementation and to develop the student's understanding of combinational and clocked sequential circuits and the relationship between

More information

ECE369. Chapter 5 ECE369

ECE369. Chapter 5 ECE369 Chapter 5 1 State Elements Unclocked vs. Clocked Clocks used in synchronous logic Clocks are needed in sequential logic to decide when an element that contains state should be updated. State element 1

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR Microprogramming Exceptions and interrupts 9 CMPE Fall 26 A. Di Blas Fall 26 CMPE CPU Multicycle From single-cycle to Multicycle CPU with sequential control: Finite State Machine Textbook Edition: 5.4,

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours. This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August Lecture 8: Control COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Datapath and Control Datapath The collection of state elements, computation elements,

More information

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle Boris Grot School of Informatics University of Edinburgh Previous lecture: single-cycle processor Inf2C Computer Systems - 2017-2018. Boris

More information

CC 311- Computer Architecture. The Processor - Control

CC 311- Computer Architecture. The Processor - Control CC 311- Computer Architecture The Processor - Control Control Unit Functions: Instruction code Control Unit Control Signals Select operations to be performed (ALU, read/write, etc.) Control data flow (multiplexor

More information

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination May 23, 2014 Name: Email: Student ID: Lab Section Number: Instructions: 1. This

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 5: The Processor: Datapath and Control

Chapter 5: The Processor: Datapath and Control Chapter 5: The Processor: Datapath and Control Overview Logic Design Conventions Building a Datapath and Control Unit Different Implementations of MIPS instruction set A simple implementation of a processor

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CS232 Final Exam May 5, 2001

CS232 Final Exam May 5, 2001 CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

LECTURE 6. Multi-Cycle Datapath and Control

LECTURE 6. Multi-Cycle Datapath and Control LECTURE 6 Multi-Cycle Datapath and Control SINGLE-CYCLE IMPLEMENTATION As we ve seen, single-cycle implementation, although easy to implement, could potentially be very inefficient. In single-cycle, we

More information

Chapter 5 Solutions: For More Practice

Chapter 5 Solutions: For More Practice Chapter 5 Solutions: For More Practice 1 Chapter 5 Solutions: For More Practice 5.4 Fetching, reading registers, and writing the destination register takes a total of 300ps for both floating point add/subtract

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Major CPU Design Steps

Major CPU Design Steps Datapath Major CPU Design Steps. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CSEN 601: Computer System Architecture Summer 2014

CSEN 601: Computer System Architecture Summer 2014 CSEN 601: Computer System Architecture Summer 2014 Practice Assignment 5 Solutions Exercise 5-1: (Midterm Spring 2013) a. What are the values of the control signals (except ALUOp) for each of the following

More information

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón ICS 152 Computer Systems Architecture Prof. Juan Luis Aragón Lecture 5 and 6 Multicycle Implementation Introduction to Microprogramming Readings: Sections 5.4 and 5.5 1 Review of Last Lecture We have seen

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions COP33 - Computer Architecture Lecture ulti-cycle Design & Exceptions Single Cycle Datapath We designed a processor that requires one cycle per instruction RegDst busw 32 Clk RegWr Rd ux imm6 Rt 5 5 Rs

More information

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (1) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CPE 335 Computer Organization. Basic MIPS Architecture Part I CPE 335 Computer Organization Basic MIPS Architecture Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s8/index.html CPE232 Basic MIPS Architecture

More information

The MIPS Processor Datapath

The MIPS Processor Datapath The MIPS Processor Datapath Module Outline MIPS datapath implementation Register File, Instruction memory, Data memory Instruction interpretation and execution. Combinational control Assignment: Datapath

More information

Microprogramming. Microprogramming

Microprogramming. Microprogramming Microprogramming Alternative way of specifying control FSM State -- bubble control signals in bubble next state given by signals on arc not a great language to specify when things are complex Treat as

More information

CSE 2021 COMPUTER ORGANIZATION

CSE 2021 COMPUTER ORGANIZATION CSE 2021 COMPUTER ORGANIZATION HUGH LAS CHESSER 1012U HUGH CHESSER CSEB 1012U W10-M Agenda Topics: 1. Multiple cycle implementation review 2. State Machine 3. Control Unit implementation for Multi-cycle

More information

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions 2 3 4 5 Prof. Stephen A. Edwards Columbia University Due June 26, 207 at :00 PM ame: Solutions Uni: Show your work for each problem;

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

ECE 313 Computer Organization FINAL EXAM December 11, Multicycle Processor Design 30 Points

ECE 313 Computer Organization FINAL EXAM December 11, Multicycle Processor Design 30 Points This exam is open book and open notes. Credit for problems requiring calculation will be given only if you show your work. 1. Multicycle Processor Design 0 Points In our discussion of exceptions in the

More information

LECTURE 5. Single-Cycle Datapath and Control

LECTURE 5. Single-Cycle Datapath and Control LECTURE 5 Single-Cycle Datapath and Control PROCESSORS In lecture 1, we reminded ourselves that the datapath and control are the two components that come together to be collectively known as the processor.

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

CS 351 Exam 2 Mon. 11/2/2015

CS 351 Exam 2 Mon. 11/2/2015 CS 351 Exam 2 Mon. 11/2/2015 Name: Rules and Hints The MIPS cheat sheet and datapath diagram are attached at the end of this exam for your reference. You may use one handwritten 8.5 11 cheat sheet (front

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath COMP33 - Computer Architecture Lecture 8 Designing a Single Cycle Datapath The Big Picture The Five Classic Components of a Computer Processor Input Control Memory Datapath Output The Big Picture: The

More information

Computer Science 141 Computing Hardware

Computer Science 141 Computing Hardware Computer Science 4 Computing Hardware Fall 6 Harvard University Instructor: Prof. David Brooks dbrooks@eecs.harvard.edu Upcoming topics Mon, Nov th MIPS Basic Architecture (Part ) Wed, Nov th Basic Computer

More information

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction Multi-cycle Approach Single cycle CPU State element Combinational logic State element clock one clock cycle or instruction Multi-cycle CPU Requires state elements to hold intermediate values State Element

More information

Processor Design Pipelined Processor (II) Hung-Wei Tseng

Processor Design Pipelined Processor (II) Hung-Wei Tseng Processor Design Pipelined Processor (II) Hung-Wei Tseng Recap: Pipelining Break up the logic with pipeline registers into pipeline stages Each pipeline registers is clocked Each pipeline stage takes one

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

Lets Build a Processor

Lets Build a Processor Lets Build a Processor Almost ready to move into chapter 5 and start building a processor First, let s review Boolean Logic and build the ALU we ll need (Material from Appendix B) operation a 32 ALU result

More information

The Processor: Datapath & Control

The Processor: Datapath & Control Chapter Five 1 The Processor: Datapath & Control We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions:

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Recall. ISA? Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Instruction Format or Encoding how is it decoded? Location of operands and

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

Multiple Cycle Data Path

Multiple Cycle Data Path Multiple Cycle Data Path CS 365 Lecture 7 Prof. Yih Huang CS365 1 Multicycle Approach Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle

More information

Processor (multi-cycle)

Processor (multi-cycle) CS359: Computer Architecture Processor (multi-cycle) Yanyan Shen Department of Computer Science and Engineering Five Instruction Steps ) Instruction Fetch ) Instruction Decode and Register Fetch 3) R-type

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor? Chapter 4 The Processor 2 Introduction We will learn How the ISA determines many aspects

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 14: One Cycle MIPs Datapath Adapted from Computer Organization and Design, Patterson & Hennessy, UCB R-Format Instructions Read two register operands Perform

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Computer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 2. Instruction Set Architecture 第二讲 指令集架构 Chao Li, PhD. 李超博士 SJTU-SE346, Spring 27 Review ENIAC (946) used decimal representation; vacuum tubes per digit; could store

More information