Arquitectura de Computadores

Size: px

Start display at page:

Download "Arquitectura de Computadores"

Kristopher Harrell
6 years ago
Views:

1 Arquitectura de Computadores Capítulo 2. Procesadores segmetados Based o the origial material of the book: D.A. Patterso y J.L. Heessy Computer Orgaizatio ad Desig: The Hardware/Software Iterface 4 th editio. Escuela Politécica Superior Uiversidad Autóoma de Madrid Profesores: G130 y G131: Ivá Gozález Martíez G136: Fracisco Javier Gómez Arribas

2 Ageda The Processor: A Basic MIPS Implemetatio Buildig a Datapath Desigig the Cotrol Uit (sigle cycle) A Overview of Pipeliig Pipelie performace MIPS five stages pipelie Hazards: Structure, Data ad Cotrol MIPS Pipelied Datapath ad Cotrol Data Hazards: Forwardig vs Stallig Cotrol Hazards: Brach predictio 2

3 Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler CPI ad Cycle time Determied by CPU hardware We will examie two MIPS implemetatios A simplified versio A more realistic pipelied versio Simple subset, shows most aspects Memory referece: lw, sw Arithmetic/logical: add, sub, ad, or, slt Cotrol trasfer: beq, j 4.1 Itroductio The Processor 3

4 Itroductio (2) We will study simple RISC processor called MIPS (Microprocessor without Iterlocked Pipelie Stages) 32 bits processor (data, memory) 32 geeral purpose registers Separated data ad code memory (Harvard architecture) The Processor 4

ALU to calculate Arithmetic result Memory address for load/store Brach

5 CPU Overview Istructio Executio PC istructio memory, fetch istructio Register umbers register file, read registers Depedig o istructio class Use ALU to calculate Arithmetic result Memory address for load/store Brach target address Access data memory for load/store PC target address or PC + 4 The Processor 5

6 Datapath & cotrol desig Datapath: Elemets that process data ad addresses i the CPU Registers, ALUs, mux s, memories, We will build a MIPS datapath icremetally 4.3 Buildig a Datapath Cotrol Uit: Iformatio comes from the 32 bits of the istructio ad the cotrol lies select: Registers to be read (always read two). The operatio to be performed by ALU If data memory is to be read or writte What is writte ad where i the register file What goes i PC Combiatioal Sigle Cycle implemetatio The Processor 6

7 Full Datapath The Processor 7

8 The Mai Cotrol Uit Cotrol sigals derived from istructio R-type Load/ Store Brach 0 rs rt rd shamt fuct 31:26 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 opcode always read read, except for load write for R-type ad load sig-exted ad add The Processor 8

9 ALU Cotrol ALU used for Load/Store: F = add Brach: F = subtract R-type: F depeds o fuct field ALU cotrol Fuctio 0000 AND 4.4 A Simple Implemetatio Scheme 0001 OR 0010 add 0110 subtract 0111 set-o-less-tha 1100 NOR The Processor 9

10 ALU Cotrol Assume 2-bit ALUOp derived from opcode Combiatioal logic derives ALU cotrol opcode ALUOp Operatio fuct ALU fuctio ALU cotrol lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 brach equal XXXXXX subtract 0110 R-type 10 add add 0010 subtract subtract 0110 AND AND 0000 OR OR 0001 set-o-less-tha set-o-less-tha 0111 The Processor 10

11 Datapath With Cotrol The Processor 11

12 R-Type Istructio The Processor 12

13 Load Istructio The Processor 13

14 Brach-o-Equal Istructio The Processor 14

15 Implemetig Jumps Jump 2 address Jump uses word address Update PC with cocateatio of Top 4 bits of old PC 26-bit jump address 00 31:26 25:0 Need a extra cotrol sigal decoded from opcode The Processor 15

16 Datapath With Jumps Added The Processor 16

17 Performace Issues Logest delay determies clock period Critical path: load istructio Istructio memory register file ALU data memory register file Not feasible to vary period for differet istructios Violates desig priciple Makig the commo case fast We will improve performace by pipeliig The Processor 17

18 Ageda A Basic MIPS Implemetatio Buildig a Datapath Desigig the Cotrol Uit (sigle cycle) A Overview of Pipeliig Pipelie performace MIPS five stages pipelie Hazards: Structure, Data ad Cotrol MIPS Pipelied Datapath ad Cotrol Data Hazards: Forwardig vs Stallig Cotrol Hazards: Brach predictio 18

19 Pipeliig Aalogy Pipelied laudry: overlappig executio Parallelism improves performace Four loads: Speedup = 8/3.5 = 2.3 No-stop: 4.5 A Overview of Pipeliig Speedup = 2/ = umber of stages The Processor 19

20 MIPS Pipelie Five stages, oe step per stage 1. IF: Istructio fetch from memory 2. ID: Istructio decode & register read 3. EX: Execute operatio or calculate address 4. MEM: Access memory operad 5. WB: Write result back to register The Processor 20

21 Pipelie Performace Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelied datapath with sigle-cycle datapath Istr Istr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps The Processor 21

22 Pipelie Performace Sigle-cycle (T c = 800ps) Pipelied (T c = 200ps) The Processor 22

23 Pipelie Speedup If all stages are balaced i.e., all take the same time Time betwee istructios pipelied = Time betwee istructios opipelied Number of stages If ot balaced, speedup is less Speedup due to icreased throughput Latecy (time for each istructio) does ot decrease The Processor 23

24 Pipeliig ad ISA Desig MIPS ISA desiged for pipeliig All istructios are 32-bits Easier to fetch ad decode i oe cycle c.f. x86: 1- to 17-byte istructios Few ad regular istructio formats Ca decode ad read registers i oe step Load/store addressig Ca calculate address i 3 rd stage, access memory i 4 th stage Aligmet of memory operads Memory access takes oly oe cycle The Processor 24

25 Hazards Situatios that prevet startig the ext istructio i the ext cycle Structure hazards A required resource is busy Data hazard Need to wait for previous istructio to complete its data read/write Cotrol hazard Decidig o cotrol actio depeds o previous istructio The Processor 25

26 Structure Hazards Coflict for use of a resource I MIPS pipelie with a sigle memory Load/store requires data access Istructio fetch would have to stall for that cycle Would cause a pipelie bubble Hece, pipelied datapaths require separate istructio/data memories Or separate istructio/data caches The Processor 26

27 Data Hazards A istructio depeds o completio of data access by a previous istructio add $s0, $t0, $t1 sub $t2, $s0, $t3 The Processor 27

28 Forwardig (aka Bypassig) Use result whe it is computed Do t wait for it to be stored i a register Requires extra coectios i the datapath The Processor 28

29 Load-Use Data Hazard Ca t always avoid stalls by forwardig If value ot computed whe eeded Ca t forward backward i time! The Processor 29

30 Code Schedulig to Avoid Stalls Reorder code to avoid use of load result i the ext istructio C code for A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles The Processor 30

31 Cotrol Hazards Brach determies flow of cotrol Fetchig ext istructio depeds o brach outcome Pipelie ca t always fetch correct istructio Still workig o ID stage of brach I MIPS pipelie Need to compare registers ad compute target early i the pipelie Add hardware to do it i ID stage The Processor 31

32 Stall o Brach Wait util brach outcome determied before fetchig ext istructio The Processor 32

33 Brach Predictio Loger pipelies ca t readily determie brach outcome early Stall pealty becomes uacceptable Predict outcome of brach Oly stall if predictio is wrog I MIPS pipelie Ca predict braches ot take Fetch istructio after brach, with o delay The Processor 33

34 MIPS with Predict Not Take Predictio correct Predictio icorrect The Processor 34

35 More-Realistic Brach Predictio Static brach predictio Based o typical brach behavior Example: loop ad if-statemet braches Predict backward braches take Predict forward braches ot take Dyamic brach predictio Hardware measures actual brach behavior e.g., record recet history of each brach Assume future behavior will cotiue the tred Whe wrog, stall while re-fetchig, ad update history The Processor 35

36 Pipelie Summary The BIG Picture Pipeliig improves performace by icreasig istructio throughput Executes multiple istructios i parallel Each istructio has the same latecy Subject to hazards Structure, data, cotrol Istructio set desig affects complexity of pipelie implemetatio The Processor 36

37 Ageda A Basic MIPS Implemetatio Buildig a Datapath Desigig the Cotrol Uit (sigle cycle) A Overview of Pipeliig Pipelie performace MIPS five stages pipelie Hazards: Structure, Data ad Cotrol MIPS Pipelied Datapath ad Cotrol Data Hazards: Forwardig vs Stallig Cotrol Hazards: Brach predictio

38 MIPS Pipelied Datapath 4.6 Pipelied Datapath ad Cotrol MEM Right-to-left flow leads to hazards WB The Processor 38

39 Pipelie registers Need registers betwee stages To hold iformatio produced i previous cycle The Processor 39

40 Pipelie Operatio Cycle-by-cycle flow of istructios through the pipelied datapath Sigle-clock-cycle pipelie diagram Shows pipelie usage i a sigle cycle Highlight resources used c.f. multi-clock-cycle diagram Graph of operatio over time We ll look at sigle-clock-cycle diagrams for load & store The Processor 40

41 IF for Load, Store, The Processor 41

42 ID for Load, Store, The Processor 42

43 EX for Load The Processor 43

44 MEM for Load The Processor 44

45 WB for Load Wrog register umber The Processor 45

46 Corrected Datapath for Load The Processor 46

47 EX for Store The Processor 47

48 MEM for Store The Processor 48

49 WB for Store The Processor 49

50 Multi-Cycle Pipelie Diagram Form showig resource usage The Processor 50

51 Multi-Cycle Pipelie Diagram Traditioal form The Processor 51

52 Sigle-Cycle Pipelie Diagram State of pipelie i a give cycle The Processor 52

53 Pipelied Cotrol (Simplified) The Processor 53

54 Pipelied Cotrol Cotrol sigals derived from istructio As i sigle-cycle implemetatio The Processor 54

55 Pipelied Cotrol The Processor 55

56 Data Hazards i ALU Istructios Cosider this sequece: sub $2, $1,$3 ad $12,$2,$5 or $13,$6,$2 add $14,$2,$2 sw $15,100($2) We ca resolve hazards with forwardig How do we detect whe to forward? 4.7 Data Hazards: Forwardig vs. Stallig The Processor 56

57 Depedecies & Forwardig The Processor 57

58 Detectig the Need to Forward Pass register umbers alog pipelie e.g., ID/EX.RegisterRs = register umber for Rs sittig i ID/EX pipelie register ALU operad register umbers i EX stage are give by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards whe 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipelie reg Fwd from MEM/WB pipelie reg The Processor 58

59 Detectig the Need to Forward But oly if forwardig istructio will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite Ad oly if Rd for that istructio is ot $zero EX/MEM.RegisterRd 0, MEM/WB.RegisterRd 0 The Processor 59

RegisterRt)) ForwardB = 10 MEM hazard * if (MEM/WB.RegWrite ad (MEM/WB.RegisterRd 0) ad (MEM/WB.RegisterRd = ID/EX.

60 Forwardig Paths & Coditios EX hazard * if (EX/MEM.RegWrite ad (EX/MEM.RegisterRd 0) ad (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 * if (EX/MEM.RegWrite ad (EX/MEM.RegisterRd 0) ad (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard * if (MEM/WB.RegWrite ad (MEM/WB.RegisterRd 0) ad (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 * if (MEM/WB.RegWrite ad (MEM/WB.RegisterRd 0) ad (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 The Processor 60

61 Double Data Hazard Cosider the sequece: add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 Both hazards occur Wat to use the most recet Revise MEM hazard coditio Oly fwd if EX hazard coditio is t true The Processor 61

62 Revised Forwardig Coditio MEM hazard if (MEM/WB.RegWrite ad (MEM/WB.RegisterRd 0) ad ot (EX/MEM.RegWrite ad (EX/MEM.RegisterRd 0) ad (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ad (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite ad (MEM/WB.RegisterRd 0) ad ot (EX/MEM.RegWrite ad (EX/MEM.RegisterRd 0) ad (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ad (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 The Processor 62

63 Datapath with Forwardig The Processor 63

64 Load-Use Data Hazard Need to stall for oe cycle The Processor 64

65 Load-Use Hazard Detectio Check whe usig istructio is decoded i ID stage ALU operad register umbers i ID stage are give by IF/ID.RegisterRs, IF/ID.RegisterRt Load-use hazard whe ID/EX.MemRead ad ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall ad isert bubble The Processor 65

66 How to Stall the Pipelie Force cotrol values i ID/EX register to 0 EX, MEM ad WB do op (o-operatio) Prevet update of PC ad IF/ID register Usig istructio is decoded agai Followig istructio is fetched agai 1-cycle stall allows MEM to read data for lw Ca subsequetly forward to EX stage The Processor 66

67 Stall/Bubble i the Pipelie Stall iserted here The Processor 67

68 Stall/Bubble i the Pipelie Or, more accurately The Processor 68

69 Datapath with Hazard Detectio The Processor 69

70 Stalls ad Performace The BIG Picture Stalls reduce performace But are required to get correct results Compiler ca arrage code to avoid hazards ad stalls Requires kowledge of the pipelie structure The Processor 70

71 Brach Hazards If brach outcome determied i MEM 4.8 Cotrol Hazards Flush these istructios (Set cotrol values to 0) PC The Processor 71

72 Reducig Brach Delay Move hardware to determie outcome to ID stage Target address adder Register comparator Example: brach take 36: sub $10, $4, $8 40: beq $1, $3, 7 44: ad $12, $2, $5 48: or $13, $2, $6 52: add $14, $4, $2 56: slt $15, $6, $ : lw $4, 50($7) The Processor 72

73 Example: Brach Take The Processor 73

74 Example: Brach Take The Processor 74

75 Data Hazards for Braches If a compariso register is a destiatio of 2 d or 3 rd precedig ALU istructio add $1, $2, $3 IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB IF ID EX MEM WB beq $1, $4, target IF ID EX MEM WB Ca resolve usig forwardig The Processor 75

76 Data Hazards for Braches If a compariso register is a destiatio of precedig ALU istructio or 2 d precedig load istructio Need 1 stall cycle lw $1, addr IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB beq stalled IF ID beq $1, $4, target ID EX MEM WB The Processor 76

77 Data Hazards for Braches If a compariso register is a destiatio of immediately precedig load istructio Need 2 stall cycles lw $1, addr IF ID EX MEM WB beq stalled IF ID beq stalled ID beq $1, $0, target ID EX MEM WB The Processor 77

78 Brach Hazard ad Predictio Static predictio: always predict the same: Speculative ruig util coditio is solved. If error, remove speculative results: Effective Predictio (E): brach occurs. No Effective predictio (NE): brach does NOT occur. Predictio NE if the brach is forward ad E if it is back. Dyamic Predictio: chage the predictio accordig the brach history. Use a small memory for each brach address (BHT, Brach History Table) PC Brach Istructio Address BHT T 1 BIT PREDICTION 2 BITS PREDICTION E F F NE T T Ef F F T F Ed F NEd T NEf T

79 Dyamic Brach Predictio I deeper ad superscalar pipelies, brach pealty is more sigificat Use dyamic predictio Brach predictio buffer (aka brach history table) Idexed by recet brach istructio addresses Stores outcome (take/ot take) To execute a brach Check table, expect the same outcome Start fetchig from fall-through or target If wrog, flush pipelie ad flip predictio The Processor 79

80 Calculatig the Brach Target Eve with predictor, still eed to calculate the target address 1-cycle pealty for a take brach Brach target buffer Cache of target addresses Idexed by PC whe istructio fetched If hit ad istructio is brach predicted take, ca fetch target immediately The Processor 80

81 Brach Target Buffer (BTB) Istructio Address Target Address History bits Brach Target Buffer (BTB) look-up table Fully associative Load target address Program Couter Target address foud Address Istructio Fetch Decod. Istructio Pipelie

82 Fallacies Pipeliig is easy (!) The basic idea is easy The devil is i the details e.g., detectig data hazards Pipeliig is idepedet of techology 4.13 Fallacies ad Pitfalls So why have t we always doe pipeliig? More trasistors make more advaced techiques feasible Pipelie-related ISA desig eeds to take accout of techology treds e.g., predicated istructios The Processor 82

83 Pitfalls Poor ISA desig ca make pipeliig harder e.g., complex istructio sets (VAX, IA-32) Sigificat overhead to make pipeliig work IA-32 micro-op approach e.g., complex addressig modes Register update side effects, memory idirectio e.g., delayed braches Advaced pipelies have log delay slots The Processor 83

84 Cocludig Remarks ISA iflueces desig of datapath ad cotrol Datapath ad cotrol ifluece desig of ISA Pipeliig improves istructio throughput usig parallelism More istructios completed per secod Latecy for each istructio ot reduced Hazards: structural, data, cotrol 4.14 Cocludig Remarks The Processor 84

85 Iformació Adicioal Iformació adicioal para los problemas del capítulo 2 85

86 Tipos de riesgos por depedecia de datos q Depedecias que se preseta para 2 istruccioes i y j, co i ejecutádose ates que j. q RAW (Read After Write): la istrucció posterior j iteta leer ua fuete ates de que la istrucció aterior i la haya modificado. q WAR (Write After Read): la istrucció j iteta modificar u destio ates de que la istrucció i lo haya leído como fuete. q WAW (Write After Write): la istrucció j iteta modificar u destio ates de que la istrucció i lo haya hecho (se modifica el orde ormal de escritura). ü Ejemplos: RAW WAR WAW ADD r1, r2, r3 ADD r1, r2, r3 DIV r1, r2, r3 SUB r5, r1, r6 OR r3,r4, r5 AND r1,r4, r5 AND r6, r5, r1 ADD r4, r1, r3 SW r10, 100(r1) E procesadores segmetados co ejecució e orde SÓLO hay que gestioar los RAW

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified