Lecture 7. Building A Simple Processor

Size: px

Start display at page:

Download "Lecture 7. Building A Simple Processor"

Ariel Griffith
5 years ago
Views:

1 Lectre 7 Bilding A Simple Processor Christos Kozyrakis Stanford University C. Kozyrakis EE8b Lectre 7

2 Annoncements Upcoming deadlines Lab is de today Demo by 5pm, report by midnight HW2 de on Thrsday 2/ PA de on Thrsday 2/8 Lab #2 is ot Learn from previos lab: start early Qiz #: Te 2/6, 7pm 9pm (location TBD) Catch-p with reading material C. Kozyrakis EE8b Lectre 7 2

3 Review: Translation Hierarchy High-level Assembly achine.c C Program Compiler.s Assembly Program Assembler.o achine Object Eectable odle Object Linker Loader emory C. Kozyrakis EE8b Lectre 7 3

4 Review: Code Optimization Goal: Improve performance by: Removing redndant work Unreachable code Common-sbepression elimination Indction variable elimination Creating simpler operations Dealing with constants in the compiler Strength redction anaging registers well Eection Time= Instrctions CPI Clock Cycle Time C. Kozyrakis EE8b Lectre 7 4

5 Assembler Epands macros and psedoinstrctions as well as converts constants Primary prpose is to prodce an object file achine langage instrctions Application Information for memory organization C. Kozyrakis EE8b Lectre 7 5

6 Object File Incldes Object header Describes file organization Tet segment achine code Data segment Static (initialized) Relocation information Symbol table Debgging information Identifies instrction & that depend on absolte address when the program is loaded Lit of labels that are not defined (e.g. eternal references) Describes relationship between sorce code and machine instrctions C. Kozyrakis EE8b Lectre 7 6

7 Linker Linker combines mltiple object modles Identify where code/ will be placed in memory Resolve code/ cross references Prodces eectable if all references fond Steps. Place code and modles in memory 2. Determine the address of and instrction labels 3. Patch both the internal and eternal references Separation between compiler and linker makes standard libraries an efficient soltion to maintaining modlar code C. Kozyrakis EE8b Lectre 7 7

8 Loader Loader sed at rn-time. s eectable file header for size of tet/ segments 2. Create address space sfficiently large 3. Copy program from eectable on disk into memory 4. Copy argments to main program s stack 5. Initialize machine registers and set stack pointer 6. Jmp to start-p rotine 7. Terminate program when eection completes C. Kozyrakis EE8b Lectre 7 8

9 What We Really Want? To go from Some goal like Find N! C. Kozyrakis EE8b Lectre 7 9

10 How to solve a comple problem Specification Break problem into simpler steps Specify goal in eectable forms Algorithm Coding in high langage (like C) Translate into even simpler steps Set of conventions to operate Compiler Translate into simpler instrctions Processor Eecte instrctions as fast as possible C. Kozyrakis EE8b Lectre 7

11 How To Bild A Processor (or any comple hardware) Break operation down to steps even gates can nderstand Generally decompose task into two kinds of operations Things that deal with the real (Datapath) Things that control the stff operating on the real (Control) Find a decomposition that is simple, and efficient Some are obvios, others can be more sbtle We will start simple stff to improve performance C. Kozyrakis EE8b Lectre 7

12 How to Eecte Instrctions First we need to: Fetch the instrction Then we need to: Decode instrction / fetch register operands Then we need to: Do the operation Then we need to: Write the reslt into register-file Finally we need to: Calclate the net instrction address C. Kozyrakis EE8b Lectre 7 2

13 How to Eecte Instrctions First we need to: Fetch the instrction Then we need to: Decode instrction / fetch register operands Then we need to: Do the operation Then we need to: Write the reslt into register-file Finally we need to: Calclate the net instrction address C. Kozyrakis EE8b Lectre 7 3

14 Sbset of Instrctions To simplify or stdy of processor design, we will focs on a sbset of the IPS instrctions emory: lw and sw Arithmetic: add, sb, and, ori, and slt Branch: beq and j The method of implementing other instrctions shold come natrally from these C. Kozyrakis EE8b Lectre 7 4

15 Starting The Design Think of the steps needed for each instrction eection Become clear qickly we need to create a seqential process Instrctions cold take mltiple cycles or one cycle Steps that occr on each instrction: Fetch instrction from memory - address is specified by PC one or two registers Do add/sb/etc. sing ALU (see Appendi B.5) Fetch a vale from memory Store reslts to register-file/memory Needed state for single cycle machine: Instrction pointer (PC), 32 registers, (memory vales) C. Kozyrakis EE8b Lectre 7 5

16 Starting Dataflow ajor fnctional nits ajor connections C. Kozyrakis EE8b Lectre 7 6

17 Logic Design Review Combinational logic Otpt only depends on inpts If yo wait long enogh yo will get the right answer To bild a processor, we also need to bild seqential logic Need to separate signals across clock cycles They also provide temporary storage This is sally done with flip-flops (or latches) We will talk abot flop-based design in this class C. Kozyrakis EE8b Lectre 7 7

18 Combinational Elements Selec t Sm U X er UX 4 ALU control Zero ALU ALU reslt ALU C. Kozyrakis EE8b Lectre 7 8

19 D Flip Flops Samples its inpt on rising edge of clock Holds the vale it samples ntil net rising edge: C. Kozyrakis EE8b Lectre 7 9

20 Critical Timing Isses Clk Flops work great as long as inpt is stable when clock rises Called setp and hold windows Clock skew can case some nasty problems Hold time violations (we won t worry abot this in this class) Cycle Time = Longest Prop Delay + Setp + Clock Skew Setp Hold Don t Care Setp Hold C. Kozyrakis EE8b Lectre 7 2

21 emory Strctre emory strctres are generally specially designed Cold bild them from flops or latches Bt they wold be big, slow, and power hngry So circit designers create the basic design Create a modle generator for logic designers to se C. Kozyrakis EE8b Lectre 7 2

22 emory Diagram C. Kozyrakis EE8b Lectre 7 22

23 from/write to emory Interface to emory can be: Combinational (asynchronos) Clocked (synchronos) Combinational memory: is valid some delay after address lines settle There is no clock. Writes are tricky: mst spply a write plse in the middle of yor address and valid times Clocked memory (most common): emory looks like a standard synchronos device. ress and control signals are sampled on rising edge of clock, and is valid some nmber of cycles later C. Kozyrakis EE8b Lectre 7 23

24 emory Timing Combinational/Asynchronos: Synchronos C. Kozyrakis EE8b Lectre 7 24

25 emories In This Design They will be combinational Otherwise we can t complete an instrction in one cycle! Interface is simple: Inpts: ress DataIn WriteEn (WriteEn mst be a plse) Otpts: DataOt Register file: It has three address, two for reads, and one for write It is called a 3-port, since it can perform 3 accesses per cycle C. Kozyrakis EE8b Lectre 7 25

26 The First Task: Fetching The Instrction (IF) Not that comple Instr = em[pc] Fetch the instrction from memory Update program conter for net cycle What is the address of the net instrction? C. Kozyrakis EE8b Lectre 7 26

27 Datapath: IF Unit 4 PC address Instrction memory Instrction [3 ] C. Kozyrakis EE8b Lectre 7 27

28 What Did We Fetch? R-format I-format J-format OP= rs rt rd sa fnct First Sorce Register Second Sorce Register Reslt Register Shift Amont OP rs rt imm OP First Sorce Register Second Sorce Register 6 26 target Jmp Target ress Immediate Fnction Code C. Kozyrakis EE8b Lectre 7 28

29 Nice Characteristics of IPS achine Code Instrctions are fied length Don t need to decode first instrction to find net one Always add 4 bytes to instrction pointer Register specifiers are always in the same place Destination moves arond some, bt Sorce registers are always in the same place Or yo don t need that register Can fetch the registers BEFORE yo decode instrction Feed bits directly from the instrction memory C. Kozyrakis EE8b Lectre 7 29

30 Register to Register Operations In or sbset this is only add and sb I did not want to worry abot overflow add rd, rs, rt sb rd, rs, rt Operation R[rd] <- R[rs] + R[rt]; R[rd] <- R[rs] - R[rt]; operation Sb operation Bits OP= rs rt rd sa fnct First Sorce Register Second Sorce Register Reslt Register Shift Amont Fnction Code C. Kozyrakis EE8b Lectre 7 3

31 Datapath: Reg/Reg Ops R[rd] <- R[rs] op R[rt]; ALU operation and RegWrite based on decoded instrction Register, Register 2, and Write Register from rs, rt, and rd fields of instrction RegWrite Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers Write register Write 2 ALUOp Zero ALU ALU reslt C. Kozyrakis EE8b Lectre 7 3

32 OR Immediate RTL OR Immediate instrction ori rt, rs, imm R[rt] <- R[rs] OR ZeroEt(imm); eans I need to get instr[5:] into the path, on RT path Bits OP rs rt imm First Sorce Register Second Sorce Register Immediate C. Kozyrakis EE8b Lectre 7 32

33 Datapath: Immediate Ops Etend path to spport immediate operations Write register is rt or rd based on instrction 2 is ignored for immediates Immediates can be sign or zero etended ALUsrc and ALU operation set based on instrction RegWrite Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers Write 2 register Write ALUSrc ALUOp Zero ALU ALU reslt RegDst Instrction [5 ] 6 Sign 32 or Zero etend C. Kozyrakis EE8b Lectre 7 33

34 Load Load instrction lw rt, rs, imm r <- R[rs]+SignEt(imm); R[rt] <- em[r]; Compte memory address Load into register Notice this will se the immediate path as well Bits OP rs rt imm First Sorce Register Second Sorce Register Immediate C. Kozyrakis EE8b Lectre 7 34

35 Datapath: Load Etend path to spport other immediate operations Etender handles either sign or zero etension UX selects between ALU reslt and emory otpt RegWrite Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst register register 2 Registers Write 2 register Write 6 Sign 32 etend ALUSrc ALUOp Zero ALU ALU reslt ress Write emwrite Data memory em emtoreg C. Kozyrakis EE8b Lectre 7 35

36 Store Store instrction sw rt, rs, imm r <- R[rs]+SignEt(imm); em[r] <- R[rt]; Compte memory addr Load into register Bits OP rs rt imm First Sorce Register Second Sorce Register Immediate C. Kozyrakis EE8b Lectre 7 36

37 Datapath: Store Register 2 is passed on to emory emory address calclated jst as in lw case RegWrite Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst register register 2 Registers Write 2 register Write 6 Sign 32 etend ALUSrc ALUOp Zero ALU ALU reslt ress Write emwrite Data memory em emtoreg C. Kozyrakis EE8b Lectre 7 37

38 Branch Branch instrction beq rs, rt, imm Cond <- R[rs] R[rt]; Calclate branch condition if (cond eq ) Test if eqal PC <- PC SignEt(imm)*4 else PC <- PC + 4; Calclate net address Bits OP rs rt imm First Sorce Second Sorce Immediate C. Kozyrakis EE8b Lectre 7 38

39 The Net ress PC is byte-addressed into instrction memory Seqential PC[3:] = PC[3:] + 4 Branch operation PC[3:] = PC[3:] SignEt(imm) 4 Instrction resses PC is byte addressed, bt instrctions are 4 bytes long Simplify hardware by sing 3 bit PC Seqential PC[3:2] = PC[3:2] + Branch operation PC[3:2] = PC[3:2] + + SignEt(imm) C. Kozyrakis EE8b Lectre 7 39

40 Datapath: IF 3 3 ALU reslt Branch Zero PC address Instrction memory Instrction [3 ] Instrction [5 ] 6 Sign 3 etend C. Kozyrakis EE8b Lectre 7 4

41 Jmp RTL Jmp instrction j target PC[3:2] <- PC[3:29] target[25:]; Bits 6 26 OP target Jmp Target ress C. Kozyrakis EE8b Lectre 7 4

42 Datapath: IFU with Jmp UX selects psedodirect jmp target 32 P C [3 28] Instrction [25 ] 4 32 Shift left 2 ALU reslt Jmp Branch Zero PC address Instrction memory Instrction [3 ] C. Kozyrakis EE8b Lectre 7 42

43 Ptting it All Together P C [3 28] Instrction [25 ] 4 Shift left 2 ALU reslt Jmp Branch RegWrite PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst register register 2 Registers Write 2 register Write 6 Sign 32 etend ALUSrc ALUOp Eq ALU ALU reslt ress Write emwrite Data memory em emtoreg C. Kozyrakis EE8b Lectre 7 43

44 Control Since every instrction takes one cycle, control is state free! It is jst decoded instrction bits There are also few control points Control on the mltipleers Operation type for the ALU Write control on the Instrction & Data memories First part of cycle does not have any control Which is good, since we don t have instrction yet Look at setting of the control points for different instrctions C. Kozyrakis EE8b Lectre 7 44

45 At Beginning Of Clock Cycle P C [3 28] Instrction [25 ] 4 Shift left 2 ALU reslt Jmp <prev> Branch <prev> PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst <prev> Write <prev> RegWrite register register 2 Registers Write 2 register 6 Sign 32 etend <prev> ALUSrc <prev> ALUOp Zero ALU ALU reslt ress Write <prev> emwrite Data memory em <prev> <prev> emtoreg <prev> C. Kozyrakis EE8b Lectre 7 45

46 Control for Arithmetic P C [3 28] Instrction [25 ] 4 Shift left 2 ALU reslt Jmp Branch PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst Write RegWrite register register 2 Registers Write 2 register 6 Sign 32 etend X ALUSrc <op> ALUOp Zero ALU ALU reslt ress Write emwrite Data memory em emtoreg C. Kozyrakis EE8b Lectre 7 46

47 Instrction Fetch at End P C [3 28] Instrction [25 ] 4 Shift left 2 ALU reslt Jmp Branch X PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst Write RegWrite register register 2 Registers Write 2 register 6 Sign 32 etend X ALUSrc <op> ALUOp Zero ALU ALU reslt ress Write emwrite Data memory em emtoreg C. Kozyrakis EE8b Lectre 7 47

48 Arithmetic Immediate (ori) P C [3 28] Instrction [25 ] 4 Shift left 2 ALU reslt Jmp Branch PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst Write RegWrite register register 2 Registers Write 2 register 6 Sign 32 etend ALUSrc Or ALUOp Zero ALU ALU reslt ress Write emwrite Data memory em emtoreg C. Kozyrakis EE8b Lectre 7 48

49 Control for Load P C [3 28] Instrction [25 ] 4 Shift left 2 ALU reslt Jmp Branch PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst Write RegWrite register register 2 Registers Write 2 register 6 Sign 32 etend ALUSrc ALUOp Zero ALU ALU reslt ress Write emwrite Data memory em emtoreg C. Kozyrakis EE8b Lectre 7 49

50 Control for Store P C [3 28] Instrction [25 ] 4 Shift left 2 ALU reslt Jmp Branch PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst X Write RegWrite register register 2 Registers Write 2 register 6 Sign 32 etend ALUSrc ALUOp Zero ALU ALU reslt ress Write emwrite Data memory em X emtoreg C. Kozyrakis EE8b Lectre 7 5

51 Control for Branch (beq) P C [3 28] Instrction [25 ] 4 Shift left 2 ALU reslt Jmp Branch PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst X Write RegWrite register register 2 Registers Write 2 register 6 Sign 32 etend X ALUSrc Sb ALUOp Zero ALU ALU reslt ress Write emwrite Data memory em X emtoreg C. Kozyrakis EE8b Lectre 7 5

52 Control for Jmp (j) P C [3 28] Instrction [25 ] 4 Shift left 2 ALU reslt Jmp Branch PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] RegDst X Write RegWrite register register 2 Registers Write 2 register 6 Sign 32 etend X X ALUSrc X ALUOp Zero ALU ALU reslt ress Write emwrite Data memory em X emtoreg C. Kozyrakis EE8b Lectre 7 52

53 Smmary of Control Signals fnc RegDst ALUSrc emtoreg RegWrite emwrite Branch Jmp EtOp ALUctr<2:> Not Important op add sb ori lw sw beq jmp Sb Or Sb C. Kozyrakis EE8b Lectre 7 53

54 ltilevel Decoding Since only the ALU needs the fnc field Pass it to the ALU nit, and have a local decoder there op 6 ain Control fnc 6 ALUop N ALU Control (Local) ALUctr 3 ALU C. Kozyrakis EE8b Lectre 7 54

55 ltilevel Decoding (cont) RegDst ALUSrc emtoreg RegWrite emwrite Branch Jmp EtOp ALUop<N:> op R-type ori lw sw beq jmp R-type Or Sbtract C. Kozyrakis EE8b Lectre 7 55

56 Ptting It All Together P C [3 28] Instrction [25 ] 4 Instrction [3 26] Control RegDst Branch em emtoreg ALUOp emwrite ALUSrc RegWrite Shift left 2 ALU reslt Jmp PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers Write 2 register Write Zero ALU ALU reslt ress Write Data memory Instrction [5 ] 6 Sign 32 etend ALU control Instrction [5 ] C. Kozyrakis EE8b Lectre 7 56

57 Single Cycle Processor Advantages Single cycle per instrction makes logic and clock simple Disadvantages Inefficient tilization of memory and fnctional nits since different instrctions take different lengths of time ALU only comptes vales a small amont of the time Cycle time is the worst case path long cycle times Load instrction All machines wold have a CPI of C. Kozyrakis EE8b Lectre 7 57

58 Increasing Parallelism Problem: Each fnctional nit sed once per cycle ost of the time it is sitting waiting for its trn Well it is calclating all the time, bt it is waiting for valid There is no parallelism in this arrangement aking instrctions take more cycles makes machine faster! Increases the parallelism going on in the machine We will look at a 5 stage pipeline odern machines (Pentim 4) have order 2 cycles/instrction C. Kozyrakis EE8b Lectre 7 58

1048: Computer Organization

1048: Computer Organization 48: Compter Organization Lectre 5 Datapath and Control Lectre5A - simple implementation (cwli@twins.ee.nct.ed.tw) 5A- Introdction In this lectre, we will try to implement simplified IPS which contain emory