CHW 362 : Computer Architecture & Organization

Size: px

Start display at page:

Download "CHW 362 : Computer Architecture & Organization"

Cora Brown
6 years ago
Views:

1 CHW 362 : Computer Architecture & Organization Instructors: Dr Ahmed Shalaby Dr Mona Ali

2 Review: Instruction Formats R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits J-Type op addr 6 bits 26 bits Chapter 7 <2>

3 R-Type Register-type 3 register operands: rs, rt: source registers rd: destination register Other fields: op: the operation code or opcode ( for R-type instructions) funct: the function with opcode, tells computer what operation to perform shamt: the shift amount for shift instructions, otherwise it s R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Chapter 7 <3>

4 R-Type Examples Assembly Code add $s, $s, $s2 sub $t, $t3, $t5 Field Values op rs rt rd shamt funct bits 5 bits 5 bits 5 bits 5 bits 6 bits Machine Code op rs rt rd shamt funct (x23282) (x6d422) 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Note the order of registers in the assembly code: add rd, rs, rt Chapter 7 <4>

5 I-Type Immediate-type 3 operands: rs, rt: register operands imm: 6-bit two s complement immediate Other fields: op: the opcode Simplicity favors regularity: all instructions have opcode Operation is completely determined by opcode I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <5>

6 I-Type Examples Assembly Code addi $s, $s, 5 addi $t, $s3, -2 lw $t2, 32($) Field Values op rs rt imm sw $s, 4($t) bits 5 bits 5 bits 6 bits Note the differing order of registers in assembly and machine codes: addi rt, rs, imm lw sw rt, imm(rs) rt, imm(rs) Machine Code op rs rt imm 6 bits 5 bits 5 bits 6 bits (x2235) (x2268fff4) (x8ca2) (xad34) Chapter 7 <6>

7 I-Type Examples PC-Relative Addressing x beq $t, $, else x4 addi $v, $, x8 addi $sp, $sp, i xc jr $ra x2 else: addi $a, $a, - x24 jal factorial Assembly Code beq $t, $, else (beq $t, $, 3) Field Values op rs rt imm bits 5 bits 5 bits 5 bits 5 bits 6 bits Chapter 7 <7>

8 J-Type Jump-type 26-bit address operand (addr) Used for jump instructions (j) op J-Type addr 6 bits 26 bits Chapter 7 <8>

9 J-Type Example Pseudo-direct Addressing x45c jal sum... x4a sum: add $v, $a, $a JTA 26-bit addr 2 8 (x4a) (x28) Field Values Machine Code op imm op addr 3 x28 6 bits 26 bits 6 bits 26 bits (xc28) Chapter 7 <9>

10 Chapter 7 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 7 <>

11 Chapter 7 :: Topics Introduction Performance Analysis Single-Cycle Processor Multicycle Processor Pipelined Processor Exceptions Advanced Microarchitecture Chapter 7 <>

12 Introduction Microarchitecture: how to implement an architecture in hardware Processor: Datapath: functional blocks Control: control signals Application Software Operating Systems Architecture Microarchitecture Logic Digital Circuits Analog Circuits Devices programs device drivers instructions registers datapaths controllers adders memories AND gates NOT gates amplifiers filters transistors diodes Physics electrons Chapter 7 <2>

13 Microarchitecture Multiple implementations for a single architecture: Single-cycle: Each instruction executes in a single cycle Multicycle: Each instruction is broken into series of shorter steps Pipelined: Each instruction broken up into series of steps & multiple instructions execute at once Chapter 7 <3>

14 Processor Performance Program execution time Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) Definitions: CPI: Cycles/instruction clock period: seconds/cycle IPC: instructions/cycle = IPC Challenge is to satisfy constraints of: Cost Power Performance Chapter 7 <4>

15 MIPS Processor Consider subset of MIPS instructions: R-type instructions: and, or, add, sub, slt instructions: lw, sw Branch instructions: beq Chapter 7 <5>

16 Architectural State Determines everything about a processor: PC 32 registers Chapter 7 <6>

17 MIPS State Elements PC' PC A RD Instruction A A2 A3 WD3 WE3 Register File RD RD A RD Data WD WE 32 Chapter 7 <7>

18 Single-Cycle MIPS Processor Datapath Control Chapter 7 <8>

19 Single-Cycle Datapath: lw fetch STEP : Fetch instruction PC' PC A RD Instruction Instr A A2 A3 WD3 WE3 Register File RD RD2 A RD Data WD WE lw rt, imm(rs) I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <9>

20 Single-Cycle Datapath: lw Register Read STEP 2: Read source operands from PC' PC A RD Instruction Instr 25:2 A A2 A3 WD3 WE3 Register File RD RD2 A RD Data WD WE lw rt, imm(rs) I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <2>

21 Single-Cycle Datapath: lw Immediate STEP 3: Sign-extend the immediate PC' PC A RD Instruction Instr 25:2 A A2 A3 WD3 WE3 Register File RD RD2 A RD Data WD WE 5: Sign Extend SignImm I-Type lw rt, imm(rs) op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <2>

22 Single-Cycle Datapath: lw address STEP 4: Compute the memory address ALUControl 2: PC' PC A RD Instruction Instr 25:2 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult A RD Data WD WE 5: Sign Extend SignImm I-Type lw rt, imm(rs) op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <22>

23 STEP 5: Read data from memory and write it back to register file PC' Single-Cycle Datapath: lw Read PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 RegWrite WE3 Register File RD RD2 ALUControl 2: SrcA SrcB ALU Zero ALUResult A RD Data WD WE ReadData 5: Sign Extend SignImm lw rt, imm(rs) I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <23>

24 Single-Cycle Datapath: lw PC Increment STEP 6: Determine address of next instruction PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 RegWrite WE3 Register File RD RD2 ALUControl 2: SrcA SrcB ALU Zero ALUResult A RD Data WD WE ReadData 4 + PCPlus4 5: Sign Extend SignImm Result Chapter 7 <24>

25 Single-Cycle Datapath: sw Write data in rt to memory PC' PC A RD Instruction Instr 25:2 2:6 2:6 A A2 A3 WD3 RegWrite WE3 Register File RD RD2 ALUControl 2: SrcA SrcB ALU Zero ALUResult WriteData A MemWrite RD Data WD WE ReadData 4 + PCPlus4 5: Sign Extend SignImm Result lw rt, imm(rs) I-Type op rs rt imm 6 bits 5 bits 5 bits 6 bits Chapter 7 <25>

26 Single-Cycle Datapath: R-Type Read from rs and rt Write ALUResult to register file Write to rd (instead of rt) R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits PC' PC A RD Instruction Instr 25:2 2:6 RegWrite RegDst ALUSrc ALUControl 2: MemWrite MemtoReg varies A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm Result Chapter 7 <26>

27 Single-Cycle Datapath: beq Determine whether values in rs and rt are equal Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4) PCSrc PC' PC A RD Instruction Instr 25:2 2:6 RegWrite RegDst ALUSrc ALUControl 2: Branch MemWrite MemtoReg x x A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch Assembly Code beq $t, $, else (beq $t, $, 3) Field Values op rs rt imm bits 5 bits 5 bits 5 bits 5 bits 6 bits Chapter 7 <27> Result

28 Chapter 7 <28> SignImm A RD Instruction + 4 A A3 WD3 RD2 RD WE3 A2 Sign Extend Register File A RD Data WD WE PC PC' Instr 25:2 2:6 5: 5: SrcB 2:6 5: <<2 + ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg 4: Result 3:26 RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit Zero PCSrc ALUControl 2: ALU Single-Cycle Processor

29 Single-Cycle Control Control Unit Opcode 5: Main Decoder MemtoReg MemWrite Branch ALUSrc RegDst RegWrite ALUOp : Funct 5: ALU Decoder ALUControl 2: Chapter 7 <29>

30 Review: ALU A N ALU N Y B N 3 F F 2: Function A & B A B A + B not used A & ~B A ~B A - B SLT Chapter 7 <3>

31 Zero Extend 2 3 Review: ALU A N B N F 2: Function A & B N A B N F 2 A + B not used C out + [N-] S A & ~B A ~B A - B N N N N SLT 2 F : Y N Chapter 7 <3>

32 Control Unit: ALU Decoder ALUOp : Meaning Add Subtract Look at Funct Not Used R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits ALUOp : Funct ALUControl 2: X (Add) X X (Subtract) X (add) (Add) X (sub) (Subtract) X (and) (And) X (or) (Or) X (slt) (SLT) Chapter 7 <32>

33 Control Unit Main Decoder Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : R-type lw sw beq 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC 4 A RD Instruction + PCPlus4 Instr 25:2 2:6 2:6 5: 5: A A2 A3 WD3 WE3 Register File RD RD2 WriteReg 4: Sign Extend SignImm SrcA SrcB <<2 ALU + Zero ALUResult WriteData PCBranch WE A RD Data WD ReadData Chapter 7 <33> Result

34 Control Unit: Main Decoder Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : R-type lw sw X X beq X X Chapter 7 <34>

35 Single-Cycle Datapath: or 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op Funct ALUSrc RegDst PCSrc RegWrite PC' PC 4 A RD Instruction + PCPlus4 Instr 25:2 2:6 2:6 5: 5: A A2 A3 WD3 WE3 Register File RD RD2 WriteReg 4: Sign Extend SignImm SrcA SrcB <<2 Zero ALU + ALUResult WriteData PCBranch A RD Data WD WE ReadData Result R-Type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Chapter 7 <35>

36 Extended Functionality: addi 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op Funct ALUSrc RegDst PCSrc RegWrite PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch No change to datapath Result Chapter 7 <36>

37 Control Unit: addi Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : R-type lw sw X X beq X X addi Chapter 7 <37>

38 Control Unit: addi Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : R-type lw sw X X beq X X addi Chapter 7 <38>

39 Extended Functionality: j Jump 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData Result PCJump 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch 27: 3:28 25: <<2 J-Type op addr 6 bits 26 bits Chapter 7 <39>

40 Control Unit: Main Decoder Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : Jump R-type lw sw X X beq X X j Chapter 7 <4>

41 Control Unit: Main Decoder Instruction Op 5: RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp : Jump R-type lw sw X X beq X X j X X X X XX Chapter 7 <4>

42 Review: Processor Performance Program Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x T C Chapter 7 <42>

43 Single-Cycle Performance 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC 4 A RD Instruction + PCPlus4 Instr 25:2 2:6 2:6 5: 5: A A2 A3 WD3 WE3 Register File RD RD2 WriteReg 4: Sign Extend SignImm SrcA SrcB <<2 Zero ALU + ALUResult WriteData PCBranch A RD Data WD WE ReadData Result T C limited by critical path (lw) Chapter 7 <43>

44 Single-Cycle Performance Single-cycle critical path: T c = t pcq_pc + t mem + max(t read, t sext + t mux ) + t ALU + t mem + t mux + t setup Typically, limiting paths are: memory, ALU, register file T c = t pcq_pc + 2t mem + t read + t mux + t ALU + t setup Chapter 7 <44>

45 Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 3 Register setup t setup 2 Multiplexer t mux 25 ALU t ALU 2 read t mem 25 Register file read t read 5 Register file setup t setup 2 T c =? Chapter 7 <45>

46 Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 3 Register setup t setup 2 Multiplexer t mux 25 ALU t ALU 2 read t mem 25 Register file read t read 5 Register file setup t setup 2 T c = t pcq_pc + 2t mem + t read + t mux + t ALU + t setup = [3 + 2(25) ] ps = 925 ps Chapter 7 <46>

47 Single-Cycle Performance Example Program with billion instructions: Execution Time = # instructions x CPI x T C = ( 9 )()(925-2 s) = 92.5 seconds Chapter 7 <47>

48 Multicycle MIPS Processor Single-cycle: + simple - cycle time limited by longest instruction (lw) - 2 adders/alus & 2 memories Multicycle: + higher clock speed + simpler instructions run faster + reuse expensive hardware on multiple cycles - sequencing overhead paid many times Same design steps: datapath & control Chapter 7 <48>

49 CHW 362 : Computer Architecture & Organization Instructors: Dr Ahmed Shalaby Dr Mona Ali

50 CHW 362 : Computer Architecture & Organization ما هو رأيك فيما يلى :. المحتوى العلمى للكورس موضحا العيوب من وجهة نظرك ثم اذكرنصائح لتحسين العيوب وتطوير الكورس للسنة القادمة. 2. المحاضر طريقة التدريس العيوب بالمحاضرة ثم اذكر نصائح للتطوير. 3. العملى و الهيئة المعاونة العيوب نصائح للتطوير.

51 Review: Single-Cycle Processor Jump 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData Result PCJump 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch 27: 3:28 25: <<2 Chapter 7 <5>

52 Pipelined MIPS Processor Temporal parallelism Divide single-cycle processor into 5 stages: Fetch Decode Execute Writeback Add pipeline registers between stages Chapter 7 <52>

53 MemtoReg RegDst Main Controller FSM: Fetch Reset S: Fetch IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <53>

54 MemtoReg RegDst Main Controller FSM: Fetch S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <54>

55 MemtoReg RegDst Main Controller FSM: Decode S: Fetch S: Decode IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' X WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B X SrcA XXX Zero X XX ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <55>

56 MemtoReg RegDst Main Controller FSM: Address S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' X WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero X ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <56>

57 MemtoReg RegDst Main Controller FSM: Address S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' X WE PC Instr Adr RD EN A EN Instr / Data WD Data 25:2 2:6 2:6 5: X X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero X ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm Chapter 7 <57>

58 Main Controller FSM: lw S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead IorD = S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <58>

59 Main Controller FSM: sw S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite IorD = IorD = MemWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <59>

60 Main Controller FSM: R-Type S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <6>

61 Main Controller FSM: beq S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <6>

62 Extended Functionality: addi S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S9: ADDI Execute Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <62>

63 Main Controller FSM: addi S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S9: ADDI Execute ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <63>

64 Main Controller FSM: j S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = J Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S: Jump S9: ADDI Execute ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <64>

65 Main Controller FSM: j S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = J Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S: Jump S9: ADDI Execute PCSrc = PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <65>

66 Review: Single-Cycle Processor Jump 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC A RD Instruction Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A RD Data WD WE ReadData Result PCJump 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch 27: 3:28 25: <<2 Chapter 7 <66>

67 Single-Cycle vs. Pipelined Single-Cycle Instr Fetch Instruction Decode Read Reg Execute ALU Read / Write Write Reg Fetch Instruction Decode Read Reg Execute ALU Read / Write Time (ps) Write Reg Pipelined Instr Fetch Instruction Decode Read Reg Execute ALU Read/Write Write Reg 2 Fetch Instruction Decode Read Reg Execute ALU Read/Write Write Reg 3 Fetch Instruction Decode Read Reg Execute ALU Read/Write Write Reg Chapter 7 <67>

68 Pipelined Processor Abstraction Time (cycles) lw $ lw $s2, 4($) IM 4 + DM $s2 add $s3, $t, $t2 IM add $t $t2 + DM $s3 sub $s4, $s, $s5 IM sub $s $s5 - DM $s4 and $s5, $t5, $t6 IM and $t5 $t6 & DM $s5 sw $s6, 2($s) IM sw $s 2 + DM $s6 or $s7, $t3, $t4 IM or $t3 $t4 DM $s7 Chapter 7 <68>

69 Single-Cycle & Pipelined Datapath PC' PC 4 A RD Instruction + PCPlus4 Instr 25:2 2:6 2:6 5: 5: A A2 A3 WD3 WE3 Register File RD RD2 Sign Extend SignImm PC' PCF A RD Instruction InstrD 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 RtE RdE SrcAE SrcBE WriteDataE WriteRegE 4: ZeroM ALUOutM WriteDataM WE A RD Data WD ALUOutW ReadDataW + 4 5: Sign Extend SignImmE <<2 PCBranchM + SrcA SrcB WriteReg 4: <<2 ALU + Zero ALUResult WriteData PCBranch WE A RD Data WD ReadData Result ALU PCPlus4F PCPlus4D PCPlus4E Fetch Decode Execute Writeback ResultW Chapter 7 <69>

70 Corrected Pipelined Datapath ALUOutW PC' PCF A RD Instruction InstrD 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: A RD Data WD WE ReadDataW WriteRegW 4: 4 + 5: Sign Extend SignImmE <<2 + PCBranchM PCPlus4F PCPlus4D PCPlus4E ResultW Fetch Decode Execute Writeback WriteReg must arrive at same time as Result Chapter 7 <7>

71 Pipelined Processor Control Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Funct BranchD ALUControlD ALUSrcD BranchE ALUControlE 2: ALUSrcE BranchM PCSrcM RegDstD RegDstE ALUOutW PC' PCF A RD Instruction InstrD 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 RtE RdE SrcAE SrcBE WriteDataE ALU WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: A RD Data WD WE ReadDataW WriteRegW 4: 4 + 5: Sign Extend SignImmE <<2 + PCBranchM PCPlus4F PCPlus4D PCPlus4E Same control unit as single-cycle processor Control delayed to proper pipeline stage Chapter 7 <7> ResultW

72 Pipeline Hazards When an instruction depends on result from instruction that hasn t completed Types: Data hazard: register value not yet written back to register file Control hazard: next instruction not decided yet (caused by branches) Chapter 7 <72>

73 Data Hazard Time (cycles) add $s2 add $s, $s2, $s3 IM $s3 + DM $s and $t, $s, $s IM and $s $s & DM $t or $t, $s4, $s IM or $s4 $s DM $t sub $t2, $s, $s5 IM sub $s $s5 - DM $t2 Chapter 7 <73>

74 Handling Data Hazards Insert nops in code at compile time Rearrange code at compile time Forward data at run time Stall the processor at run time Chapter 7 <74>

75 Compile-Time Hazard Elimination Insert enough nops for result to be ready Or move independent useful instructions forward Time (cycles) add $s2 add $s, $s2, $s3 IM $s3 + DM $s nop IM nop DM nop IM nop DM and $t, $s, $s IM and $s $s & DM $t or $t, $s4, $s IM or $s4 $s DM $t sub $t2, $s, $s5 IM sub $s $s5 - DM $t2 Chapter 7 <75>

76 Data Forwarding Time (cycles) add $s2 add $s, $s2, $s3 IM $s3 + DM $s and $t, $s, $s IM and $s $s & DM $t or $t, $s4, $s IM or $s4 $s DM $t sub $t2, $s, $s5 IM sub $s $s5 - DM $t2 Chapter 7 <76>

77 Data Forwarding Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Funct ALUControlD 2: ALUSrcD RegDstD BranchD ALUControlE 2: ALUSrcE RegDstE BranchE BranchM PCSrcM PC' PCF 4 A RD Instruction + InstrD 25:2 2:6 25:2 2:6 5: 5: A A2 A3 WD3 WE3 Sign Extend Register File RD RD2 SignImmD RsD RtD RdD RsE RtE RdE SrcBE SignImmE <<2 SrcAE WriteDataE ALU WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: + PCPlus4F PCPlus4D PCPlus4E PCBranchM ResultW ForwardAE ForwardBE RegWriteM RegWriteW Hazard Unit Chapter 7 <77>

78 Stalling Time (cycles) lw $s, 4($) IM 4 lw $ + DM $s and $t, $s, $s IM and Trouble! $s $s & DM $t or $t, $s4, $s IM or $s4 $s DM $t sub $t2, $s, $s5 IM sub $s $s5 - DM $t2 Chapter 7 <78>

79 Stalling Time (cycles) lw $s, 4($) IM 4 lw $ + DM $s and $t, $s, $s IM and $s $s $s $s & DM $t or $t, $s4, $s IM or IM or $s4 $s DM $t sub $t2, $s, $s5 Stall IM sub $s $s5 - DM $t2 Chapter 7 <79>

80 Stalling Hardware Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Funct ALUControlD 2: ALUSrcD RegDstD BranchD ALUControlE 2: ALUSrcE RegDstE BranchE BranchM PCSrcM PC' EN PCF 4 A RD Instruction + InstrD 25:2 2:6 25:2 2:6 5: 5: A A2 A3 WD3 WE3 Sign Extend Register File RD RD2 SignImmD RsD RtD RdD RsE RtE RdE SrcAE SrcBE WriteDataE ALU WriteRegE 4: SignImmE <<2 ZeroM ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: + PCPlus4F EN PCPlus4D PCPlus4E PCBranchM ResultW StallF StallD FlushE ForwardAE ForwardBE MemtoRegE RegWriteM RegWriteW CLR Hazard Unit Chapter 7 <8>

81 Control Hazards beq: branch not determined until 4 th stage of pipeline Instructions after branch fetched before branch occurs These instructions must be flushed if branch happens Branch misprediction penalty number of instruction flushed when branch is taken May be reduced by determining branch earlier Chapter 7 <8>

82 EN CLR EN Control Hazards: Original Pipeline Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Funct ALUControlD 2: ALUSrcD RegDstD BranchD ALUControlE 2: ALUSrcE RegDstE BranchE BranchM PCSrcM PC' PCF A RD Instruction InstrD 25:2 2:6 25:2 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 RsD RtD RdD RsE RtE RdE SrcAE SrcBE WriteDataE ALU WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: 4 + 5: Sign Extend SignImmD SignImmE <<2 + PCPlus4F PCPlus4D PCPlus4E PCBranchM ResultW StallF StallD FlushE ForwardAE ForwardBE MemtoRegE RegWriteM RegWriteW Hazard Unit Chapter 7 <82>

83 slt Control Hazards Time (cycles) 2 $t beq $t, $t2, 4 IM $t2 lw - DM 24 and $t, $s, $s IM and $s $s & DM 28 or $t, $s4, $s IM or $s4 $s DM Flush these instructions 2C sub $t2, $s, $s5 IM sub $s $s5 - DM slt $t3, $s2, $s3 IM $s3 slt $s2 DM $t3 Chapter 7 <83>

84 EN CLR CLR EN Early Branch Resolution Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM ALUControlD 2: ALUControlE 2: 3:26 5: Op Funct ALUSrcD RegDstD ALUSrcE RegDstE BranchD PC' PCF A RD Instruction InstrD 25:2 2:6 25:2 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 EqualD PCSrcD = RsD RtD RdE RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: 4 + 5: Sign Extend SignImmD <<2 SignImmE + PCPlus4F PCPlus4D PCBranchD ResultW StallF StallD FlushE ForwardAE ForwardBE MemtoRegE RegWriteM RegWriteW Hazard Unit Introduced another data hazard in Decode stage Chapter 7 <84>

85 slt Early Branch Resolution Time (cycles) 2 $t beq $t, $t2, 4 IM $t2 lw - DM 24 and $t, $s, $s IM and $s $s & DM Flush this instruction 28 or $t, $s4, $s 2C sub $t2, $s, $s slt $t3, $s2, $s3 IM $s3 slt $s2 DM $t3 Chapter 7 <85>

86 EN CLR CLR EN Handling Data & Control Hazards Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM ALUControlD 2: ALUControlE 2: 3:26 5: Op Funct ALUSrcD RegDstD ALUSrcE RegDstE BranchD PC' PCF A RD Instruction InstrD 25:2 2:6 25:2 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 EqualD PCSrcD = RsD RtD RdD RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: ALUOutM WriteDataM WriteRegM 4: WE A RD Data WD ReadDataW ALUOutW WriteRegW 4: 4 + 5: Sign Extend SignImmD <<2 SignImmE + PCPlus4F PCPlus4D PCBranchD ResultW StallF StallD BranchD ForwardAD ForwardBD FlushE ForwardAE ForwardBE MemtoRegE RegWriteE RegWriteM RegWriteW Hazard Unit Chapter 7 <86>

87 Branch Prediction Guess whether branch will be taken Backward branches are usually taken (loops) Consider history to improve guess Good prediction reduces fraction of branches requiring a flush Chapter 7 <87>

88 Pipelined Performance Example SPECINT2 benchmark: 25% loads % stores % branches 2% jumps 52% R-type Suppose: 4% of loads used by next instruction 25% of branches mispredicted All jumps flush next instruction What is the average CPI? Chapter 7 <88>

89 Pipelined Performance Example SPECINT2 benchmark: 25% loads % stores % branches 2% jumps 52% R-type Suppose: 4% of loads used by next instruction 25% of branches mispredicted All jumps flush next instruction What is the average CPI? Load/Branch CPI = when no stalling, 2 when stalling CPI lw = (.6) + 2(.4) =.4 CPI beq = (.75) + 2(.25) =.25 Average CPI = (.25)(.4) + (.)() + (.)(.25) + (.2)(2) + (.52)() =.5 Chapter 7 <89>

90 Pipelined Performance Pipelined processor critical path: T c = max { t pcq + t mem + t setup 2(t read + t mux + t eq + t AND + t mux + t setup ) t pcq + t mux + t mux + t ALU + t setup t pcq + t memwrite + t setup 2(t pcq + t mux + t write ) } Chapter 7 <9>

91 Pipelined Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 3 Register setup t setup 2 Multiplexer t mux 25 ALU t ALU 2 read t mem 25 Register file read t read 5 Register file setup t setup 2 Equality comparator t eq 4 AND gate t AND 5 write t memwrite 22 Register file write t write T c = 2(t read + t mux + t eq + t AND + t mux + t setup ) = 2[ ] ps = 55 ps Chapter 7 <9>

92 Pipelined Performance Example Program with billion instructions Execution Time = (# instructions) CPI T c = ( 9 )(.5)(55-2 ) = 63 seconds Chapter 7 <92>

93 Processor Performance Comparison Processor Execution Time (seconds) Single-cycle 92.5 Multicycle 33.7 Pipelined Speedup (single-cycle as baseline) Chapter 7 <93>

94 Review: Exceptions Unscheduled function call to exception handler Caused by: Hardware, also called an interrupt, e.g. keyboard Software, also called traps, e.g. undefined instruction When exception occurs, the processor: Records cause of exception (Cause register) Jumps to exception handler (x88) Returns to program (EPC register) Chapter 7 <94>

95 Example Exception Chapter 7 <95>

96 Exception Registers Not part of register file Cause Records cause of exception Coprocessor register 3 EPC (Exception PC) Records PC where exception occurred Coprocessor register 4 Move from Coprocessor mfc $t, Cause Moves contents of Cause into $t mfc $t (8) Cause (3) 3:26 25:2 2:6 5: : Chapter 7 <96>

97 Exception Causes Exception Hardware Interrupt System Call Breakpoint / Divide by Undefined Instruction Arithmetic Overflow Cause x x2 x24 x28 x3 Extend multicycle MIPS processor to handle last two types of exceptions Chapter 7 <97>

98 Exception Hardware: EPC & Cause EPCWrite IntCause CauseWrite x3 x28 EN Cause EPC EN PCEn IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB : ALUControl 2: BranchPCWrite PCSrc : PC' PC EN Adr A RD Instr / Data WD WE Instr EN Data 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 A B 3:28 4 <<2 SrcA SrcB ALU Zero ALUResult ALUOut Overflow PCJump <<2 x8 8 27: SignImm 5: Sign Extend 25: (jump) Chapter 7 <98>

99 Control FSM with Exceptions S2: Undefined PCSrc = PCWrite IntCause = CauseWrite EPCWrite S4: MFC RegDst = Memtoreg = RegWrite Op = others S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = mfc Op = BEQ Op = J Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S: Jump S9: ADDI Execute PCSrc = PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead IorD = Op = SW S5: MemWrite IorD = MemWrite S7: ALU Writeback Overflow RegDst = MemtoReg = RegWrite Overflow S3: Overflow PCSrc = PCWrite IntCause = CauseWrite EPCWrite S: ADDI Writeback RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite Chapter 7 <99>

100 Exception Hardware: mfc EPCWrite IntCause x3 x28 EN CauseWrite Cause EN EPC... C... 5: IorD MemWrite IRWrite RegDst MemtoReg : RegWrite ALUSrcA PCEn ALUSrcB : ALUControl 2: BranchPCWrite PCSrc : PC' PC EN Adr A RD Instr / Data WD WE Instr EN Data 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 A B 3:28 4 <<2 SrcA SrcB ALU Zero ALUResult ALUOut Overflow PCJump <<2 x8 8 27: 5: Sign Extend SignImm 25: (jump) Chapter 7 <>

101 Advanced Microarchitecture Deep Pipelining Branch Prediction Superscalar Processors Out of Order Processors Register Renaming SIMD Multithreading Multiprocessors Chapter 7 <>

102 Deep Pipelining -2 stages typical Number of stages limited by: Pipeline hazards Sequencing overhead Power Cost Chapter 7 <2>

103 Branch Prediction Ideal pipelined processor: CPI = Branch misprediction increases CPI Static branch prediction: Check direction of branch (forward or backward) If backward, predict taken Else, predict not taken Dynamic branch prediction: Keep history of last (several hundred) branches in branch target buffer, record: Branch destination Whether branch was taken Chapter 7 <3>

104 Branch Prediction Example add $s, $, $ # sum = add $s, $, $ # i = addi $t, $, # $t = for: beq $s, $t, done # if i ==, branch add $s, $s, $s # sum = sum + i addi $s, $s, # increment i j for done: Chapter 7 <4>

105 -Bit Branch Predictor Remembers whether branch was taken the last time and does the same thing Mispredicts first and last branch of loop Chapter 7 <5>

106 2-Bit Branch Predictor strongly taken taken predict taken weakly taken weakly not taken taken predict taken predict taken taken taken taken not taken taken predict not taken strongly not taken taken Only mispredicts last branch of loop Chapter 7 <6>

107 Superscalar Multiple copies of datapath execute multiple instructions at once Dependencies make it tricky to issue multiple instructions at once PC A RD Instruction A A2 A3 A4 A5 A6 WD3 WD6 Register File RD RD4 RD2 RD5 ALUs A RD A2 RD2 Data WD WD2 Chapter 7 <7>

108 + Superscalar Example lw $t, 4($s) add $t, $t, $s sub $t, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t Actual IPC: 2 or $t3, $s5, $s6 sw $s7, 8($t3) lw $t, 4($s) add $t, $s, $s2 IM lw add $s 4 $s $s2 + + DM $t $t Time (cycles) sub $t2, $s, $s3 and $t3, $s3, $s4 IM $s sub $t2 $s3 - DM $s3 and $t3 $s4 & or $t4, $s, $s5 sw $s5, 8($s) IM or sw $s $s5 $s 8 DM $s5 $t4 Chapter 7 <8>

109 Superscalar with Dependencies lw $t, 4($s) add $t, $t, $s sub $t, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t Actual IPC: 6/5 =.2 or $t3, $s5, $s6 sw $s7, 8($t3) Time (cycles) lw $t, 4($s) IM lw $s 4 + DM $t add $t, $t, $s sub $t, $s2, $s3 IM add sub $t $s $s2 $s3 $t $s $s2 $s3 + - DM $t $t and $t2, $s4, $t or $t3, $s5, $s6 Stall and IM or IM and or $s4 $t $s5 $s6 & DM $t2 $t3 sw $s7, 8($t3) IM sw $t3 8 + $s7 DM Chapter 7 <9>

110 Out of Order Processor Looks ahead across multiple instructions Issues as many instructions as possible at once Issues instructions out of order (as long as no dependencies) Dependencies: RAW (read after write): one instruction writes, later instruction reads a register WAR (write after read): one instruction reads, later instruction writes a register WAW (write after write): one instruction writes, later instruction writes a register Chapter 7 <>

111 Out of Order Processor Instruction level parallelism (ILP): number of instruction that can be issued simultaneously (average < 3) Scoreboard: table that keeps track of: Instructions waiting to issue Available functional units Dependencies Chapter 7 <>

112 Out of Order Processor Example lw $t, 4($s) add $t, $t, $s sub $t, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t Actual IPC: 6/4 =.5 or $t3, $s5, $s6 sw $s7, 8($t3) lw $t, 4($s) or $t3, $s5, $s6 RAW sw $s7, 8($t3) two cycle latency between load and RAW use of $t add $t, $t, $s WAR sub $t, $s2, $s3 RAW and $t2, $s4, $t IM lw or $s 4 $s5 $s6 + DM $t $t3 $t3 sw $s7 8 + DM IM IM add sub IM $t $s $s2 $s3 and + - $s4 $t & DM $t $t DM Time (cycles) $t2 Chapter 7 <2>

113 Register Renaming lw $t, 4($s) add $t, $t, $s sub $t, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t Actual IPC: 6/3 = 2 or $t3, $s5, $s6 sw $s7, 8($t3) Time (cycles) lw $t, 4($s) sub $r, $s2, $s3 IM lw sub $s 4 $s2 $s3 + - DM $t $r 2-cycle RAW RAW and $t2, $s4, $r or $t3, $s5, $s6 IM and or $s4 $r $s5 $s6 & DM $t2 $t3 RAW add $t, $t, $s sw $s7, 8($t3) IM add sw $t $s $t DM $s7 $t Chapter 7 <3>

114 SIMD Single Instruction Multiple Data (SIMD) Single instruction acts on multiple pieces of data at once Common application: graphics Perform short arithmetic operations (also called packed arithmetic) For example, add four 8-bit elements padd8 $s2, $s, $s Bit position a 3 a 2 a a $s + b 3 b 2 b b $s a 3 + b 3 a 2 + b 2 a + b a + b $s2 Chapter 7 <4>

115 Advanced Architecture Techniques Multithreading Wordprocessor: thread for typing, spell checking, printing Multiprocessors Multiple processors (cores) on a single chip Chapter 7 <5>

116 Threading: Definitions Process: program running on a computer Multiple processes can run at once: e.g., surfing Web, playing music, writing a paper Thread: part of a program Each process has multiple threads: e.g., a word processor may have threads for typing, spell checking, printing Chapter 7 <6>

117 Threads in Conventional Processor One thread runs at once When one thread stalls (for example, waiting for memory): Architectural state of that thread stored Architectural state of waiting thread loaded into processor and it runs Called context switching Appears to user like all threads running simultaneously Chapter 7 <7>

118 Multithreading Multiple copies of architectural state Multiple threads active at once: When one thread stalls, another runs immediately If one thread can t keep all execution units busy, another thread can use them Does not increase instruction-level parallelism (ILP) of single thread, but increases throughput Intel calls this hyperthreading Chapter 7 <8>

119 Multiprocessors Multiple processors (cores) with a method of communication between them Types: Homogeneous: multiple cores with shared memory Heterogeneous: separate cores for different tasks (for example, DSP and CPU in cell phone) Clusters: each core has own memory system Chapter 7 <9>

120 Other Resources Patterson & Hennessy s: Computer Architecture: A Quantitative Approach Conferences: ISCA (International Symposium on Computer Architecture) HPCA (International Symposium on High Performance Computer Architecture) Chapter 7 <2>

121 Study: CHW 362 : Computer Architecture & Organization Why? How? What?

122 What? Computer Architecture computer architecture defines how to command a processor. computer architecture is a set of rules and methods that describe the functionality, organization, and implementation of computer system.

123 How? Course Book

124 How? Course Content Lec # Subject Week # Lec Chapter : From Zero to One Week # Lec2 Chapter 2: Combinational Logic Design Week #2 Lec 3 Chapter 3: Sequential Logic Design Week #3 Lec 4 Chapter 3 : continue Week #4 Lec 5 Chapter 4: Hardware Description Language Week #5 Lec 6 Chapter 4 : continue Week #6 Midterm Exam Week #7 Lec 7 Chapter 5: Digital Building Blocks Week #8 Lec 8 Chapter 5 continue Week #9 Lec 9 Chapter 6: Computer Architecture Week # Lec Chapter 6 : continue Week # Lec Chapter 7: Microarchitecture Week #2 Lec 2 Chapter 7 continue Week #3

125 Assessment Final-Term Examination 65 Mid-Term Examination Practical Examination 5 Oral Examination Chapter 8 is a self study - In Exam.

126 Exam Topics Timing of Sequential Logic Section 3.5 Multi Cycle Processor Section 8.4, 8.5, 8.6 Chapter 7 <26>

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues. Lecture 2: Pipelining Topics Introduction to pipelining Performance Pipelined datapath Design issues Hazards in pipeline Types Solutions Pipelining is Natural! Laundry Example Use case scenario Ann, Brian,