Computer Architecture

Size: px

Start display at page:

Download "Computer Architecture"

Myron Reynolds
6 years ago
Views:

1 Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Based on original slides by Prof. Onr tl

2 Review Compter Architectre Today and Basics (Lectres ) Fndamental Concepts (Lectre 2) ISA basics and tradeoffs (Lectres 3) Last Lectre: ISA tradeoffs contined length Uniform vs. non-niform decode Nmber of registers Addressing modes Aligned vs. naligned access RISC vs. CISC properties 2

3 icroarchitectre will cover the following Start icroarchitectre Single-cycle icroarchitectres lti-cycle icroarchitectres icroprogrammed icroarchitectres Pipelining Isses in Pipelining: Control & Data Dependence Handling, State aintenance and Recovery, 3

4 Implementing the ISA: icroarchitectre Basics 4

5 Processing Cycle s are processed nder the direction of a control nit step by step. cycle: Seqence of steps to process an instrction Fndamentally, there are si phases: Fetch Decode Evalate Address Fetch Operands Eecte Store Reslt Not all instrctions reqire all si stages (see P&P Ch. 4) 5

6 How Does a achine Process s? What does processing an instrction mean? Remember the von Nemann model AS = Architectral (programmer visible) state before an instrction is processed Process instrction AS = Architectral (programmer visible) state after an instrction is processed Processing an instrction: Transforming AS to AS according to the ISA specification of the instrction 6

7 Processing Cycle vs. achine Clock Cycle Single-cycle machine: All si phases of the instrction processing cycle take a single machine clock cycle to complete lti-cycle machine: All si phases of the instrction processing cycle can take mltiple machine clock cycles to complete In fact, each phase can take mltiple clock cycles to complete 7

8 How the processor fnction 8

9 Single-cycle vs. lti-cycle achines Single-cycle machines Each instrction takes a single clock cycle All state pdates made at the end of an instrction s eection Big disadvantage: The slowest instrction determines cycle time long clock cycle time lti-cycle machines processing broken into mltiple cycles/stages State pdates can be made dring an instrction s eection Architectral state pdates made only at the end of an instrction s eection Advantage over single-cycle: The slowest stage determines cycle time Both single-cycle and mlti-cycle machines literally follow the von Nemann model at the microarchitectre level 9

10 Processing Viewed Another Way s transform Data (AS) to Data (AS ) This transformation is done by fnctional nits Units that operate on These nits need to be told what to do to the An instrction processing engine consists of two components Datapath: Consists of hardware elements that deal with and transform signals fnctional nits that operate on hardware strctres (e.g. wires and mes) that enable the flow of into the fnctional nits and registers storage nits that store (e.g., registers) Control logic: Consists of hardware elements that determine control signals, i.e., signals that specify what the path elements shold do to the

11 Single-cycle vs. lti-cycle: Control & Data Single-cycle machine: Control signals are generated in the same clock cycle as the one dring which signals are operated on Everything related to an instrction happens in one clock cycle (serialized processing) lti-cycle machine: Control signals needed in the net cycle can be generated in the crrent cycle Latency of control processing can be overlapped with latency of path operation (more parallelism)

12 Flash-Forward: Performance Analysis Eection time of an instrction {CPI} {clock cycle time} Eection time of a program Sm over all instrctions [{CPI} {clock cycle time}] {# of instrctions} {Average CPI} {clock cycle time} Single cycle microarchitectre performance CPI = Clock cycle time = long lti-cycle microarchitectre performance CPI = different for each instrction Average CPI hopeflly small Clock cycle time = short Now, we have two degrees of freedom to optimize independently 2

13 A Single-Cycle icroarchitectre A Closer Look 3

14 Remember Single-cycle machine Combinational Logic AS Seqential Logic (State) AS 4

15 Let s Start with the State Elements Data and control inpts PC register register 2 Registers register 2 Reg em address memory Address Data memory em **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 5

16 For Now, We Will Assme agic memory and register file Combinational read otpt of the read port is a combinational fnction of the register file contents and the corresponding read select port Synchronos write the selected register is pdated on the positive edge clock transition when write enable is asserted Cannot affect read otpt in between clock edges Single-cycle, synchronos memory Contrast this with memory that tells when the is ready i.e., y bit: indicating the read or write is done 6

17 Processing 5 generic steps (P&H book) IF fetch (IF) decode and register operand fetch (ID/RF) Eecte/emory address generation (EX/AG) emory operand fetch (E) Store/writeback reslt (WB) Data Register # PC Address Registers ALU Register # memory ID/RF Register # EX/AG WB Address Data Data memory E **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 7

18 What Is To Come: The Fll IPS Datapath [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 4 Add PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address [3 ] memory [25 2] [2 6] [5 ] register register 2 Registers 2 register bcond Zero ALU ALU reslt Address Data memory [5 ] 6 Sign 32 etend ALU control ALU operation [5 ] **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] JAL, JR, JALR omitted 8

19 Single-Cycle Datapath for Arithmetic and Logical s 9

20 R-Type ALU s Assembly (e.g., register-register signed addition) ADD rd reg rs reg rt reg achine encoding 3-26 ADD 6 bit 25-2 rs 5 bit 2-6 rt 5 bit 5- rd 5 bit -6 shamt 5 bit 5- fnct 6 bit R type Semantics if E[PC] == ADD rd rs rt GPR[rd] GPR[rs] + GPR[rt] PC PC + 4 2

21 ALU Datapath Add 4 PC address memory 25:2 2:6 5: register register 2 Registers register 2 3 ALU operation Zero ALU ALU reslt Reg if E[PC] == ADD rd rs rt GPR[rd] GPR[rs] + GPR[rt] PC PC + 4 **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] IF ID EX E WB Combinational state pdate logic 2

22 Apply R - Type if E[PC] == ADD rd rs rt GPR[rd] GPR[rs] + GPR[rt] PC PC + 4 Add 4 Sign 6 Etend 32 PC Address Reg emory Reg Reg RS Data RT Registers RD Data 2 Data ALUop ALU U X Address Data Data emory Data Reg Reg 22

23 I-Type ALU s Assembly (e.g., register-immediate signed additions) ADDI rt reg rs reg immediate 6 achine encoding 3-26 ADDI 6-bit 25-2 rs 5-bit 2-6 rt 5-bit 5- immediate 6-bit I-type Semantics if E[PC] == ADDI rt rs immediate GPR[rt] GPR[rs] + sign-etend (immediate) PC PC

24 Apply I - Type if E[PC] == ADDI rt rs immediate GPR[rt] GPR[rs] + sign-etend (immediate) PC PC

25 Datapath for R and I-Type ALU Insts. Add 4 PC address memory 25:2 2:6 5: RegDest isitype register register 2 Registers register Reg 2 6 Sign 32 etend 3 ALUSrc isitype ALU operatio Zero ALU ALU reslt if E[PC] == ADDI rt rs immediate GPR[rt] GPR[rs] + sign etend (immediate) PC PC + 4 **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] IF ID EX E WB Combinational state pdate logic 25

26 Single-Cycle Datapath for Data ovement s 26

27 Load s Assembly (e.g., load 4-byte word) LW rt reg offset 6 (base reg ) achine encoding LW 6 bit base 5 bit rt 5 bit offset 6 bit I type Semantics if E[PC]==LW rt offset 6 (base) EA = sign-etend(offset) + GPR[base] GPR[rt] E[ translate(ea) ] PC PC

28 LW Datapath PC address memory 4 Add RegDest isitype register register 2 Registers register Reg 2 6 Sign 32 etend add 3 ALU operatio Zero ALU ALU reslt ALUSrc isitype Address em Data memory em if E[PC]==LW rt offset 6 (base) EA = sign etend(offset) + GPR[base] GPR[rt] E[ translate(ea) ] PC PC + 4 IF ID EX E WB Combinational state pdate logic 28

29 Apply LW if E[PC] == ADDI rt rs immediate GPR[rt] GPR[rs] + sign-etend (immediate) PC PC + 4 Add 4 Reg Reg PC Address emory U X RS Data RT Registers RD Data 2 Data ALUop ALU Address Data Data emory Data Reg Sign 6 Etend 32 29

30 Store s Assembly (e.g., store 4-byte word) SW rt reg offset 6 (base reg ) achine encoding SW 6 bit base 5 bit rt 5 bit offset 6 bit I type Semantics if E[PC]==SW rt offset 6 (base) EA = sign-etend(offset) + GPR[base] E[ translate(ea) ] GPR[rt] PC PC + 4 3

31 SW Datapath PC address memory 4 Add RegDest isitype register register 2 Registers register Reg 2 6 Sign 32 etend add 3 ALUSrc isitype ALU operatio Zero ALU ALU reslt Address em Data memory em if E[PC]==SW rt offset 6 (base) EA = sign etend(offset) + GPR[base] E[ translate(ea) ] GPR[rt] PC PC + 4 IF ID EX E WB Combinational state pdate logic 3

32 Apply LW Add ALUop ALU Sign 6 Etend 32 Reg 4 PC U X RS Data RT Registers RD Data 2 Data Reg Address emory Address Data Data emory Data Reg 32

33 Load-Store Datapath Add PC address memory 4 RegDest isitype register register 2 Registers register Reg!isStore 2 6 Sign 32 etend add 3 Zero ALU ALU reslt ALUSrc isitype ALU operation Address isstore em Data memory isload em **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 33

34 Datapath for (Non-Control-Flow) Insts. Add PC address memory 4 RegDest isitype register register 2 Registers register Reg!isStore 2 6 Sign 32 etend 3 Zero ALU ALU reslt ALUSrc isitype ALU operation Address isstore em Data memory isload em emtoreg isload **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 34

35 Single-Cycle Datapath for Control Flow s 35

36 Unconditional Jmp s Assembly J immediate 26 achine encoding J 6 bit immediate 26 bit J type Semantics if E[PC]==J immediate 26 target = { PC[3:28], immediate 26, 2 b } PC target 36

37 Unconditional Jmp Datapath isj PCSrc concat PC address memory 4 **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] Add? register register 2 Registers register Reg 2 6 Sign 32 etend X 3 X ALU operation Zero ALU ALU reslt ALUSrc Address em Data memory em if E[PC]==J immediate26 PC ={ PC[3:28], immediate26, 2 b } What abot JR, JAL, JALR? 37

38 Aside: IPS Cheat Sheet edia=mips_reference_.pdf Looks like! 38

39 Conditional Branch s Assembly (e.g., branch if eqal) BEQ rs reg rt reg immediate 6 achine encoding BEQ 6 bit rs 5 bit rt 5 bit immediate 6 bit I type Semantics (assming no branch delay slot) if E[PC]==BEQ rs rt immediate 6 target = PC sign-etend(immediate) 4 if GPR[rs]==GPR[rt] then PC target else PC PC

40 Conditional Branch Datapath (for yo to finish) watch ot PCSrc concat PC address memory 4 Add PC + 4 from instrction path register register 2 Registers register 2 Shift left 2 Add sb 3 Sm ALU operation ALU bcond Zero Branch target To branch control logic Reg 6 Sign 32 etend **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] How to phold the delayed branch semantics? 4

41 Ptting It All Together [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 4 Add PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address [3 ] memory [25 2] [2 6] [5 ] register register 2 Registers 2 register bcond Zero ALU ALU reslt Address Data memory [5 ] 6 Sign 32 etend ALU control ALU operation [5 ] **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] JAL, JR, JALR omitted 4

42 Single-Cycle Control Logic 42

43 Single-Cycle Hardwired Control As combinational fnction of Inst=E[PC] 3 opcode 6 bit 3 3 opcode 6 bit opcode 6 bit rs 5 bit rs 5 bit 2 2 rt 5 bit rt 5 bit immediate 26 bit 6 6 rd 5 bit immediate 6 bit shamt 5 bit 6 fnct 6 bit R type I type J type Consider All R-type and I-type ALU instrctions LW and SW BEQ, BNE, BLEZ, BGTZ J, JR, JAL, JALR 43

44 44

45 Single-Bit Control Signals JAL and JALR reqire additional RegDest and emtoreg options 45

46 ALU Control 46

47 ALU Control 47

48 R-Type ALU ADD rs rt rd

49 I-Type ALU ADD rs rt imm

50 LW lw base(rs), rt, imm

51 BEQ beq rs, rt, imm

52 52

53 53

55 ALU Control 55

56 What is in That Control Bo? Combinational Logic Hardwired Control Idea: Control signals generated combinationally based on instrction Necessary in a single-cycle microarchitectre Seqential Logic Seqential/icroprogrammed Control Idea: A memory strctre contains the control signals associated with an instrction Control Store 56

57 Evalating the Single-Cycle icroarchitectre 57

58 A Single-Cycle icroarchitectre Is this a good idea/design? When is this a good design? When is this a bad design? How can we design a better microarchitectre? 58

59 A Single-Cycle icroarchitectre: Analysis Every instrction takes cycle to eecte CPI (Cycles per instrction) is strictly How long each instrction takes is determined by how long the slowest instrction takes to eecte Even thogh many instrctions do not need that long to eecte Clock cycle time of the microarchitectre is determined by how long it takes to complete the slowest instrction Critical path of the design is determined by the processing time of the slowest instrction 59

60 What is the Slowest to Process? Let s go back to the basics All si phases of the instrction processing cycle take a single machine clock cycle to complete Fetch Decode Evalate Address Fetch Operands Eecte Store Reslt. fetch (IF) 2. decode and register operand fetch (ID/RF) 3. Eecte/Evalate memory address (EX/AG) 4. emory operand fetch (E) 5. Store/writeback reslt (WB) Do each of the above phases take the same time (latency) for all instrctions? 6

61 Single-Cycle Datapath Analysis Assme memory nits (read or write): 2 ps ALU and adders: ps register file (read or write): 5 ps other combinational logic: ps steps IF ID EX E WB resorces em RF ALU mem RF Delay R type I type LW SW Branch Jmp 2 2 6

62 Let s Find the Critical Path [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 4 Add PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address [3 ] memory [25 2] [2 6] [5 ] register register 2 Registers 2 register bcond Zero ALU ALU reslt Address Data memory [5 ] 6 Sign 32 etend ALU control ALU operation [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 62

63 R-Type and I-Type ALU [25 ] Shift Jmp address [3 ] left PCSrc =Jmp ps 4 Add ps PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address [3 ] memory 2ps [25 2] [2 6] [5 ] register register 2 Registers 2 register 4ps 25ps bcond Zero ALU ALU reslt Address 35ps Data memory [5 ] 6 Sign 32 etend ALU control ALU operation [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 63

64 LW [25 ] Shift Jmp address [3 ] left PCSrc =Jmp ps 4 Add ps PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address [3 ] memory 2ps [25 2] [2 6] [5 ] register register 2 Registers 2 register 6ps 25ps bcond Zero ALU ALU reslt Address 35ps Data memory 55ps [5 ] 6 Sign 32 etend ALU control ALU operation [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 64

65 SW [25 ] Shift Jmp address [3 ] left PCSrc =Jmp ps 4 Add ps PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address [3 ] memory 2ps [25 2] [2 6] [5 ] register register 2 Registers 2 register 25ps bcond Zero ALU ALU reslt 35ps Address Data 55ps memory [5 ] 6 Sign 32 etend ALU control ALU operation [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 65

66 Branch Taken 35ps PC 4 address Add [3 ] memory ps [25 ] Shift Jmp address [3 ] left ps PC+4 [3 28] [3 26] [25 2] [2 6] [5 ] [5 ] RegDst Jmp Branch em Control emtoreg ALUOp em ALUSrc Reg register register 2 Registers 2 register 6 Sign 32 etend Shift left 2 25ps ALU control 2ps Add ALU reslt bcond Zero ALU ALU reslt 35ps ALU operation Address PCSrc =Jmp Data memory PCSrc 2 =Br Taken [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 66

67 Jmp [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 2ps 4 Add ps PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address [3 ] memory 2ps [25 2] [2 6] [5 ] register register 2 Registers 2 register bcond Zero ALU ALU reslt Address Data memory [5 ] 6 Sign 32 etend ALU control ALU operation [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 67

68 What Abot Control Logic? How does that affect the critical path? Think abot it!: Can control logic be on the critical path? A note on CDC 56: control store access too long 68

69 What is the Slowest to Process? emory is not magic What if memory sometimes takes ms to access? Does it make sense to have a simple register to register add or jmp to take {ms+all else to do a memory operation}? And, what if yo need to access memory more than once to process an instrction? Which instrctions need this? Do yo provide mltiple ports to memory? 69

70 Single Cycle Arch: Compleity Contrived All instrctions rn as slow as the slowest instrction Inefficient All instrctions rn as slow as the slowest instrction st provide worst-case combinational resorces in parallel as reqired by any instrction Need to replicate a resorce if it is needed more than once by an instrction dring different parts of the instrction processing cycle Not necessarily the simplest way to implement an ISA Single-cycle implementation of REP OVS (86) or INDEX (VAX)? Not easy to optimize/improve performance Optimizing the common case does not work (e.g. common instrctions) Need to optimize the worst case all the time 7

71 (icro)architectre Design Principles Critical path design Find and decrease the maimm combinational logic delay Break a path into mltiple cycles if it takes too long Bread and btter (common case) design Spend time and resorces on where it matters most i.e., improve what the machine is really designed to do Common case vs. ncommon case Balanced design Balance instrction/ flow throgh hardware components Design to eliminate bottlenecks: balance the hardware for the work 7

72 Single-Cycle Design vs. Design Principles Critical path design Bread and btter (common case) design Balanced design How does a single-cycle microarchitectre fare in light of these principles? 72

73 lti-cycle icroarchitectres 73

Computer Architecture

Computer Architecture Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Spring 25 Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics