COMPUTER ORGANIZATION AND DESIGN

Size: px
Start display at page:

Download "COMPUTER ORGANIZATION AND DESIGN"

Transcription

1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Review Istructio Set Architecture Istructio Set The repertoire of istructios of a computer Differet computers have differet istructio sets But with may aspects i commo Early computers had very simple istructio sets Simplified implemetatio May moder computers also have simple istructio sets 2.1 Itroductio Chapter 2 Istructios: Laguage of the Computer 2

2 Register Operads Arithmetic istructios use register operads MIPS has a bit register file Use for frequetly accessed data Numbered 0 to bit data called a word Assembler ames $t0, $t1,, $t9 for temporary values $s0, $s1,, $s7 for saved variables Desig Priciple 2: Smaller is faster c.f. mai memory: millios of locatios 2.3 Operads of the Computer Hardware Chapter 2 Istructios: Laguage of the Computer 3 Register Operad Example C code: f = (g + h) - (i + j); f,, j i $s0,, $s4 Compiled MIPS code: add $t0, $s1, $s2 add $t1, $s3, $s4 sub $s0, $t0, $t1 Chapter 2 Istructios: Laguage of the Computer 4

3 Memory Operad Example 1 C code: g = h + A[8]; g i $s1, h i $s2, base address of A i $s3 Compiled MIPS code: Idex 8 requires offset of 32 4 bytes per word lw $t0, 32($s3) add $s1, $s2, $t0 # load word offset base register Chapter 2 Istructios: Laguage of the Computer 5 Registers vs. Memory Registers are faster to access tha memory Operatig o memory data requires loads ad stores More istructios to be executed Compiler must use registers for variables as much as possible Oly spill to memory for less frequetly used variables Register optimizatio is importat! Chapter 2 Istructios: Laguage of the Computer 6

4 Immediate Operads Costat data specified i a istructio addi $s3, $s3, 4 No subtract immediate istructio Just use a egative costat addi $s2, $s1, -1 Desig Priciple 3: Make the commo case fast Small costats are commo Immediate operad avoids a load istructio Chapter 2 Istructios: Laguage of the Computer 7 MIPS R-format Istructios op rs rt rd shamt fuct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Istructio fields op: operatio code (opcode) rs: first source register umber rt: secod source register umber rd: destiatio register umber shamt: shift amout (00000 for ow) fuct: fuctio code (exteds opcode) Chapter 2 Istructios: Laguage of the Computer 8

5 R-format Example op rs rt rd shamt fuct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits add $t0, $s1, $s2 special $s1 $s2 $t0 0 add = Chapter 2 Istructios: Laguage of the Computer 9 MIPS I-format Istructios op rs rt costat or address 6 bits 5 bits 5 bits 16 bits Immediate arithmetic ad load/store istructios rt: destiatio or source register umber Costat: 2 15 to Address: offset added to base address i rs Desig Priciple 4: Good desig demads good compromises Differet formats complicate decodig, but allow 32-bit istructios uiformly Keep formats as similar as possible Chapter 2 Istructios: Laguage of the Computer 10

6 Coditioal Operatios Brach to a labeled istructio if a coditio is true Otherwise, cotiue sequetially beq rs, rt, L1 if (rs == rt) brach to istructio labeled L1; be rs, rt, L1 if (rs!= rt) brach to istructio labeled L1; j L1 ucoditioal jump to istructio labeled L1 2.7 Istructios for Makig Decisios Chapter 2 Istructios: Laguage of the Computer 11 Compilig If Statemets C code: if (i==j) f = g+h; else f = g-h; f, g, i $s0, $s1, Compiled MIPS code: be $s3, $s4, Else add $s0, $s1, $s2 j Exit Else: sub $s0, $s1, $s2 Exit: Assembler calculates addresses Chapter 2 Istructios: Laguage of the Computer 12

7 Compilig Loop Statemets C code: while (save[i] == k) i += 1; i i $s3, k i $s5, address of save i $s6 Compiled MIPS code: Loop: sll $t1, $s3, 2 add $t1, $t1, $s6 lw $t0, 0($t1) be $t0, $s5, Exit addi $s3, $s3, 1 j Loop Exit: Chapter 2 Istructios: Laguage of the Computer 13 Procedure Callig Steps required 1. Place parameters i registers 2. Trasfer cotrol to procedure 3. Acquire storage for procedure 4. Perform procedure s operatios 5. Place result i register for caller 6. Retur to place of call 2.8 Supportig Procedures i Computer Hardware Chapter 2 Istructios: Laguage of the Computer 14

8 Register Usage $a0 $a3: argumets (reg s 4 7) $v0, $v1: result values (reg s 2 ad 3) $t0 $t9: temporaries Ca be overwritte by callee $s0 $s7: saved Must be saved/restored by callee $gp: global poiter for static data (reg 28) $sp: stack poiter (reg 29) $fp: frame poiter (reg 30) $ra: retur address (reg 31) Chapter 2 Istructios: Laguage of the Computer 15 Procedure Call Istructios Procedure call: jump ad lik jal ProcedureLabel Address of followig istructio put i $ra Jumps to target address Procedure retur: jump register jr $ra Copies $ra to program couter Ca also be used for computed jumps e.g., for case/switch statemets Chapter 2 Istructios: Laguage of the Computer 16

9 Leaf Procedure Example C code: it leaf_example (it g, h, i, j) { it f; f = (g + h) - (i + j); retur f; } Argumets g,, j i $a0,, $a3 f i $s0 (hece, eed to save $s0 o stack) Result i $v0 Chapter 2 Istructios: Laguage of the Computer 17 Leaf Procedure Example MIPS code: leaf_example: addi $sp, $sp, -4 sw $s0, 0($sp) add $t0, $a0, $a1 add $t1, $a2, $a3 sub $s0, $t0, $t1 add $v0, $s0, $zero lw $s0, 0($sp) addi $sp, $sp, 4 jr $ra Save $s0 o stack Procedure body Result Restore $s0 Retur Chapter 2 Istructios: Laguage of the Computer 18

10 No-Leaf Procedures Procedures that call other procedures For ested call, caller eeds to save o the stack: Its retur address Ay argumets ad temporaries eeded after the call Restore from the stack after the call Chapter 2 Istructios: Laguage of the Computer 19 No-Leaf Procedure Example C code: it fact (it ) { if ( < 1) retur f; else retur * fact( - 1); } Argumet i $a0 Result i $v0 Chapter 2 Istructios: Laguage of the Computer 20

11 No-Leaf Procedure Example MIPS code: fact: addi $sp, $sp, -8 # adjust stack for 2 items sw $ra, 4($sp) # save retur address sw $a0, 0($sp) # save argumet slti $t0, $a0, 1 # test for < 1 beq $t0, $zero, L1 addi $v0, $zero, 1 # if so, result is 1 addi $sp, $sp, 8 # pop 2 items from stack jr $ra # ad retur L1: addi $a0, $a0, -1 # else decremet jal fact # recursive call lw $a0, 0($sp) # restore origial lw $ra, 4($sp) # ad retur address addi $sp, $sp, 8 # pop 2 items from stack mul $v0, $a0, $v0 # multiply to get result jr $ra # ad retur Chapter 2 Istructios: Laguage of the Computer 21 Local Data o the Stack Local data allocated by callee e.g., C automatic variables Procedure frame (activatio record) Used by some compilers to maage stack storage Chapter 2 Istructios: Laguage of the Computer 22

12 Brach Addressig Brach istructios specify Opcode, two registers, target address Most brach targets are ear brach Forward or backward op rs rt costat or address 6 bits 5 bits 5 bits 16 bits PC-relative addressig Target address = PC + offset 4 PC already icremeted by 4 by this time Chapter 2 Istructios: Laguage of the Computer 23 Jump Addressig Jump (j ad jal) targets could be aywhere i text segmet Ecode full address i istructio op address 6 bits 26 bits (Pseudo)Direct jump addressig Target address = PC : (address 4) Chapter 2 Istructios: Laguage of the Computer 24

13 Target Addressig Example Loop code from earlier example Assume Loop at locatio Loop: sll $t1, $s3, add $t1, $t1, $s lw $t0, 0($t1) be $t0, $s5, Exit addi $s3, $s3, j Loop Exit: Chapter 2 Istructios: Laguage of the Computer 25 Addressig Mode Summary Chapter 2 Istructios: Laguage of the Computer 26

14 Istructio Ecodig Chapter 2 Istructios: Laguage of the Computer 27 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Review The Processor

15 Istructio Executio PC istructio memory, fetch istructio Register umbers register file, read registers Depedig o istructio class Use ALU to calculate Arithmetic result Memory address for load/store Brach target address Access data memory for load/store PC target address or PC + 4 Chapter 4 The Processor 29 CPU Overview Chapter 4 The Processor 30

16 Multiplexers Ca t just joi wires together Use multiplexers Chapter 4 The Processor 31 Cotrol Chapter 4 The Processor 32

17 Logic Desig Basics Iformatio ecoded i biary Low voltage = 0, High voltage = 1 Oe wire per bit Multi-bit data ecoded o multi-wire buses Combiatioal elemet Operate o data Output is a fuctio of iput State (sequetial) elemets Store iformatio 4.2 Logic Desig Covetios Chapter 4 The Processor 33 Combiatioal Elemets AND-gate Y = A & B Adder Y = A + B A B + Y A B I0 I1 M u x S Y Multiplexer Y = S? I1 : I0 Y Arithmetic/Logic Uit Y = F(A, B) A ALU Y B F Chapter 4 The Processor 34

18 Sequetial Elemets Register: stores data i a circuit Uses a clock sigal to determie whe to update the stored value Edge-triggered: update whe Clk chages from 0 to 1 D Clk Q Clk D Q Chapter 4 The Processor 35 Sequetial Elemets Register with write cotrol Oly updates o clock edge whe write cotrol iput is 1 Used whe stored value is required later Clk D Write Clk Q Write D Q Chapter 4 The Processor 36

19 Clockig Methodology Combiatioal logic trasforms data durig clock cycles Betwee clock edges Iput from state elemets, output to state elemet Logest delay determies clock period Chapter 4 The Processor 37 Buildig a Datapath Datapath Elemets that process data ad addresses i the CPU Registers, ALUs, mux s, memories, We will build a MIPS datapath icremetally Refiig the overview desig 4.3 Buildig a Datapath Chapter 4 The Processor 38

20 Istructio Fetch 32-bit register Icremet by 4 for ext istructio Chapter 4 The Processor 39 R-Format Istructios Read two register operads Perform arithmetic/logical operatio Write register result Chapter 4 The Processor 40

21 Load/Store Istructios Read register operads Calculate address usig 16-bit offset Use ALU, but sig-exted offset Load: Read memory ad update register Store: Write register value to memory Chapter 4 The Processor 41 Brach Istructios Read register operads Compare operads Use ALU, subtract ad check Zero output Calculate target address Sig-exted displacemet Shift left 2 places (word displacemet) Add to PC + 4 Already calculated by istructio fetch Chapter 4 The Processor 42

22 Brach Istructios Just re-routes wires Sig-bit wire replicated Chapter 4 The Processor 43 Composig the Elemets First-cut data path does a istructio i oe clock cycle Each datapath elemet ca oly do oe fuctio at a time Hece, we eed separate istructio ad data memories Use multiplexers where alterate data sources are used for differet istructios Chapter 4 The Processor 44

23 R-Type/Load/Store Datapath Chapter 4 The Processor 45 Full Datapath Chapter 4 The Processor 46

24 ALU Cotrol ALU used for Load/Store: F = add Brach: F = subtract R-type: F depeds o fuct field ALU cotrol Fuctio 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-o-less-tha 1100 NOR 4.4 A Simple Implemetatio Scheme Chapter 4 The Processor 47 ALU Cotrol Assume 2-bit ALUOp derived from opcode Combiatioal logic derives ALU cotrol opcode ALUOp Operatio fuct ALU fuctio ALU cotrol lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 brach equal XXXXXX subtract 0110 R-type 10 add add 0010 subtract subtract 0110 AND AND 0000 OR OR 0001 set-o-less-tha set-o-less-tha 0111 Chapter 4 The Processor 48

25 The Mai Cotrol Uit Cotrol sigals derived from istructio R-type Load/ Store Brach 0 rs rt rd shamt fuct 31:26 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 opcode always read read, except for load write for R-type ad load sig-exted ad add Chapter 4 The Processor 49 Datapath With Cotrol Chapter 4 The Processor 50

26 R-Type Istructio Chapter 4 The Processor 51 Load Istructio Chapter 4 The Processor 52

27 Brach-o-Equal Istructio Chapter 4 The Processor 53 Implemetig Jumps Jump 2 address Jump uses word address Update PC with cocateatio of Top 4 bits of old PC 26-bit jump address 00 31:26 25:0 Need a extra cotrol sigal decoded from opcode Chapter 4 The Processor 54

28 Datapath With Jumps Added Chapter 4 The Processor 55 Performace Issues Logest delay determies clock period Critical path: load istructio Istructio memory register file ALU data memory register file Not feasible to vary period for differet istructios Violates desig priciple Makig the commo case fast We will improve performace by pipeliig Chapter 4 The Processor 56

29 Pipeliig Aalogy Pipelied laudry: overlappig executio Parallelism improves performace Four loads: Speedup = 8/3.5 = 2.3 No-stop: Speedup = 2/ = umber of stages 4.5 A Overview of Pipeliig Chapter 4 The Processor 57 MIPS Pipelie Five stages, oe step per stage 1. IF: Istructio fetch from memory 2. ID: Istructio decode & register read 3. EX: Execute operatio or calculate address 4. MEM: Access memory operad 5. WB: Write result back to register Chapter 4 The Processor 58

30 Pipelie Performace Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelied datapath with sigle-cycle datapath Istr Istr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps Chapter 4 The Processor 59 Pipelie Performace Sigle-cycle (T c = 800ps) Pipelied (T c = 200ps) Chapter 4 The Processor 60

31 Pipelie Speedup If all stages are balaced i.e., all take the same time Time betwee istructios pipelied = Time betwee istructios opipelied Number of stages If ot balaced, speedup is less Speedup due to icreased throughput Latecy (time for each istructio) does ot decrease Chapter 4 The Processor 61 Pipeliig ad ISA Desig MIPS ISA desiged for pipeliig All istructios are 32-bits Easier to fetch ad decode i oe cycle c.f. x86: 1- to 17-byte istructios Few ad regular istructio formats Ca decode ad read registers i oe step Load/store addressig Ca calculate address i 3 rd stage, access memory i 4 th stage Aligmet of memory operads Memory access takes oly oe cycle Chapter 4 The Processor 62

32 Hazards Situatios that prevet startig the ext istructio i the ext cycle Structure hazards A required resource is busy Data hazard Need to wait for previous istructio to complete its data read/write Cotrol hazard Decidig o cotrol actio depeds o previous istructio Chapter 4 The Processor 63 Structure Hazards Coflict for use of a resource I MIPS pipelie with a sigle memory Load/store requires data access Istructio fetch would have to stall for that cycle Would cause a pipelie bubble Hece, pipelied datapaths require separate istructio/data memories Or separate istructio/data caches Chapter 4 The Processor 64

33 Data Hazards A istructio depeds o completio of data access by a previous istructio add $s0, $t0, $t1 sub $t2, $s0, $t3 Chapter 4 The Processor 65 Forwardig (aka Bypassig) Use result whe it is computed Do t wait for it to be stored i a register Requires extra coectios i the datapath Chapter 4 The Processor 66

34 Load-Use Data Hazard Ca t always avoid stalls by forwardig If value ot computed whe eeded Ca t forward backward i time! Chapter 4 The Processor 67 Code Schedulig to Avoid Stalls Reorder code to avoid use of load result i the ext istructio C code for A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles Chapter 4 The Processor 68

35 Cotrol Hazards Brach determies flow of cotrol Fetchig ext istructio depeds o brach outcome Pipelie ca t always fetch correct istructio Still workig o ID stage of brach I MIPS pipelie Need to compare registers ad compute target early i the pipelie Add hardware to do it i ID stage Chapter 4 The Processor 69 Stall o Brach Wait util brach outcome determied before fetchig ext istructio Chapter 4 The Processor 70

36 Brach Predictio Loger pipelies ca t readily determie brach outcome early Stall pealty becomes uacceptable Predict outcome of brach Oly stall if predictio is wrog I MIPS pipelie Ca predict braches ot take Fetch istructio after brach, with o delay Chapter 4 The Processor 71 MIPS with Predict Not Take Predictio correct Predictio icorrect Chapter 4 The Processor 72

37 More-Realistic Brach Predictio Static brach predictio Based o typical brach behavior Example: loop ad if-statemet braches Predict backward braches take Predict forward braches ot take Dyamic brach predictio Hardware measures actual brach behavior e.g., record recet history of each brach Assume future behavior will cotiue the tred Whe wrog, stall while re-fetchig, ad update history Chapter 4 The Processor 73 Pipelie Summary The BIG Picture Pipeliig improves performace by icreasig istructio throughput Executes multiple istructios i parallel Each istructio has the same latecy Subject to hazards Structure, data, cotrol Istructio set desig affects complexity of pipelie implemetatio Chapter 4 The Processor 74

38 MIPS Pipelied Datapath 4.6 Pipelied Datapath ad Cotrol MEM Right-to-left flow leads to hazards WB Chapter 4 The Processor 75 Pipelie registers Need registers betwee stages To hold iformatio produced i previous cycle Chapter 4 The Processor 76

39 Pipelie Operatio Cycle-by-cycle flow of istructios through the pipelied datapath Sigle-clock-cycle pipelie diagram Shows pipelie usage i a sigle cycle Highlight resources used c.f. multi-clock-cycle diagram Graph of operatio over time We ll look at sigle-clock-cycle diagrams for load & store Chapter 4 The Processor 77 IF for Load, Store, Chapter 4 The Processor 78

40 ID for Load, Store, Chapter 4 The Processor 79 EX for Load Chapter 4 The Processor 80

41 MEM for Load Chapter 4 The Processor 81 WB for Load Wrog register umber Chapter 4 The Processor 82

42 Corrected Datapath for Load Chapter 4 The Processor 83 EX for Store Chapter 4 The Processor 84

43 MEM for Store Chapter 4 The Processor 85 WB for Store Chapter 4 The Processor 86

44 Multi-Cycle Pipelie Diagram Form showig resource usage Chapter 4 The Processor 87 Multi-Cycle Pipelie Diagram Traditioal form Chapter 4 The Processor 88

45 Sigle-Cycle Pipelie Diagram State of pipelie i a give cycle Chapter 4 The Processor 89 Pipelied Cotrol (Simplified) Chapter 4 The Processor 90

46 Pipelied Cotrol Cotrol sigals derived from istructio As i sigle-cycle implemetatio Chapter 4 The Processor 91 Pipelied Cotrol Chapter 4 The Processor 92

47 Data Hazards i ALU Istructios Cosider this sequece: sub $2, $1,$3 ad $12,$2,$5 or $13,$6,$2 add $14,$2,$2 sw $15,100($2) We ca resolve hazards with forwardig 4.7 Data Hazards: Forwardig vs. Stallig How do we detect whe to forward? Chapter 4 The Processor 93 Depedecies & Forwardig Chapter 4 The Processor 94

48 Detectig the Need to Forward Pass register umbers alog pipelie e.g., ID/EX.RegisterRs = register umber for Rs sittig i ID/EX pipelie register ALU operad register umbers i EX stage are give by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards whe 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipelie reg Fwd from MEM/WB pipelie reg Chapter 4 The Processor 95 Detectig the Need to Forward But oly if forwardig istructio will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite Ad oly if Rd for that istructio is ot $zero EX/MEM.RegisterRd 0, MEM/WB.RegisterRd 0 Chapter 4 The Processor 96

49 Forwardig Paths Chapter 4 The Processor 97 Forwardig Coditios EX hazard if (EX/MEM.RegWrite ad (EX/MEM.RegisterRd 0) ad (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite ad (EX/MEM.RegisterRd 0) ad (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard if (MEM/WB.RegWrite ad (MEM/WB.RegisterRd 0) ad (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite ad (MEM/WB.RegisterRd 0) ad (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 Chapter 4 The Processor 98

50 Double Data Hazard Cosider the sequece: add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 Both hazards occur Wat to use the most recet Revise MEM hazard coditio Oly fwd if EX hazard coditio is t true Chapter 4 The Processor 99 Revised Forwardig Coditio MEM hazard if (MEM/WB.RegWrite ad (MEM/WB.RegisterRd 0) ad ot (EX/MEM.RegWrite ad (EX/MEM.RegisterRd 0) ad (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ad (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite ad (MEM/WB.RegisterRd 0) ad ot (EX/MEM.RegWrite ad (EX/MEM.RegisterRd 0) ad (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ad (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 Chapter 4 The Processor 100

51 Datapath with Forwardig Chapter 4 The Processor 101 Load-Use Data Hazard Need to stall for oe cycle Chapter 4 The Processor 102

52 Load-Use Hazard Detectio Check whe usig istructio is decoded i ID stage ALU operad register umbers i ID stage are give by IF/ID.RegisterRs, IF/ID.RegisterRt Load-use hazard whe ID/EX.MemRead ad ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall ad isert bubble Chapter 4 The Processor 103 How to Stall the Pipelie Force cotrol values i ID/EX register to 0 EX, MEM ad WB do op (o-operatio) Prevet update of PC ad IF/ID register Usig istructio is decoded agai Followig istructio is fetched agai 1-cycle stall allows MEM to read data for lw Ca subsequetly forward to EX stage Chapter 4 The Processor 104

53 Stall/Bubble i the Pipelie Stall iserted here Chapter 4 The Processor 105 Stall/Bubble i the Pipelie Or, more accurately Chapter 4 The Processor 106

54 Datapath with Hazard Detectio Chapter 4 The Processor 107 Stalls ad Performace The BIG Picture Stalls reduce performace But are required to get correct results Compiler ca arrage code to avoid hazards ad stalls Requires kowledge of the pipelie structure Chapter 4 The Processor 108

55 Brach Hazards If brach outcome determied i MEM 4.8 Cotrol Hazards Flush these istructios (Set cotrol values to 0) PC Chapter 4 The Processor 109 Reducig Brach Delay Move hardware to determie outcome to ID stage Target address adder Register comparator Example: brach take 36: sub $10, $4, $8 40: beq $1, $3, 7 44: ad $12, $2, $5 48: or $13, $2, $6 52: add $14, $4, $2 56: slt $15, $6, $ : lw $4, 50($7) Chapter 4 The Processor 110

56 Example: Brach Take Chapter 4 The Processor 111 Example: Brach Take Chapter 4 The Processor 112

57 Data Hazards for Braches If a compariso register is a destiatio of 2 d or 3 rd precedig ALU istructio add $1, $2, $3 IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB IF ID EX MEM WB beq $1, $4, target IF ID EX MEM WB Ca resolve usig forwardig Chapter 4 The Processor 113 Data Hazards for Braches If a compariso register is a destiatio of precedig ALU istructio or 2 d precedig load istructio Need 1 stall cycle lw $1, addr IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB beq stalled IF ID beq $1, $4, target ID EX MEM WB Chapter 4 The Processor 114

58 Data Hazards for Braches If a compariso register is a destiatio of immediately precedig load istructio Need 2 stall cycles lw $1, addr IF ID EX MEM WB beq stalled IF ID beq stalled ID beq $1, $0, target ID EX MEM WB Chapter 4 The Processor 115 Dyamic Brach Predictio I deeper ad superscalar pipelies, brach pealty is more sigificat Use dyamic predictio Brach predictio buffer (aka brach history table) Idexed by recet brach istructio addresses Stores outcome (take/ot take) To execute a brach Check table, expect the same outcome Start fetchig from fall-through or target If wrog, flush pipelie ad flip predictio Chapter 4 The Processor 116

59 1-Bit Predictor: Shortcomig Ier loop braches mispredicted twice! outer: ier: beq,, ier beq,, outer Mispredict as take o last iteratio of ier loop The mispredict as ot take o first iteratio of ier loop ext time aroud Chapter 4 The Processor Bit Predictor Oly chage predictio o two successive mispredictios Chapter 4 The Processor 118

60 Calculatig the Brach Target Eve with predictor, still eed to calculate the target address 1-cycle pealty for a take brach Brach target buffer Cache of target addresses Idexed by PC whe istructio fetched If hit ad istructio is brach predicted take, ca fetch target immediately Chapter 4 The Processor 119 Exceptios ad Iterrupts Uexpected evets requirig chage i flow of cotrol Differet ISAs use the terms differetly Exceptio Arises withi the CPU Iterrupt e.g., udefied opcode, overflow, syscall, From a exteral I/O cotroller Dealig with them without sacrificig performace is hard 4.9 Exceptios Chapter 4 The Processor 120

61 Hadlig Exceptios I MIPS, exceptios maaged by a System Cotrol Coprocessor (CP0) Save PC of offedig (or iterrupted) istructio I MIPS: Exceptio Program Couter (EPC) Save idicatio of the problem I MIPS: Cause register We ll assume 1-bit 0 for udefied opcode, 1 for overflow Jump to hadler at Chapter 4 The Processor 121 A Alterate Mechaism Vectored Iterrupts Hadler address determied by the cause Example: Udefied opcode: C Overflow: C : C Istructios either Deal with the iterrupt, or Jump to real hadler Chapter 4 The Processor 122

62 Hadler Actios Read cause, ad trasfer to relevat hadler Determie actio required If restartable Take corrective actio use EPC to retur to program Otherwise Termiate program Report error usig EPC, cause, Chapter 4 The Processor 123 Exceptios i a Pipelie Aother form of cotrol hazard Cosider overflow o add i EX stage add $1, $2, $1 Prevet $1 from beig clobbered Complete previous istructios Flush add ad subsequet istructios Set Cause ad EPC register values Trasfer cotrol to hadler Similar to mispredicted brach Use much of the same hardware Chapter 4 The Processor 124

63 Pipelie with Exceptios Chapter 4 The Processor 125 Exceptio Properties Restartable exceptios Pipelie ca flush the istructio Hadler executes, the returs to the istructio Refetched ad executed from scratch PC saved i EPC register Idetifies causig istructio Actually PC + 4 is saved Hadler must adjust Chapter 4 The Processor 126

64 Exceptio Example Exceptio o add i 40 sub $11, $2, $4 44 ad $12, $2, $5 48 or $13, $2, $6 4C add $1, $2, $1 50 slt $15, $6, $7 54 lw $16, 50($7) Hadler sw $25, 1000($0) sw $26, 1004($0) Chapter 4 The Processor 127 Exceptio Example Chapter 4 The Processor 128

65 Exceptio Example Chapter 4 The Processor 129 Multiple Exceptios Pipeliig overlaps multiple istructios Could have multiple exceptios at oce Simple approach: deal with exceptio from earliest istructio Flush subsequet istructios Precise exceptios I complex pipelies Multiple istructios issued per cycle Out-of-order completio Maitaiig precise exceptios is difficult! Chapter 4 The Processor 130

66 Imprecise Exceptios Just stop pipelie ad save state Icludig exceptio cause(s) Let the hadler work out Which istructio(s) had exceptios Which to complete or flush May require maual completio Simplifies hardware, but more complex hadler software Not feasible for complex multiple-issue out-of-order pipelies Chapter 4 The Processor 131 Istructio-Level Parallelism (ILP) Pipeliig: executig multiple istructios i parallel To icrease ILP Deeper pipelie Less work per stage Þ shorter clock cycle Multiple issue Replicate pipelie stages Þ multiple pipelies Start multiple istructios per clock cycle CPI < 1, so use Istructios Per Cycle (IPC) E.g., 4GHz 4-way multiple-issue 16 BIPS, peak CPI = 0.25, peak IPC = 4 But depedecies reduce this i practice 4.10 Parallelism via Istructios Chapter 4 The Processor 132

67 Multiple Issue Static multiple issue Compiler groups istructios to be issued together Packages them ito issue slots Compiler detects ad avoids hazards Dyamic multiple issue CPU examies istructio stream ad chooses istructios to issue each cycle Compiler ca help by reorderig istructios CPU resolves hazards usig advaced techiques at rutime Chapter 4 The Processor 133 Speculatio Guess what to do with a istructio Start operatio as soo as possible Check whether guess was right If so, complete the operatio If ot, roll-back ad do the right thig Commo to static ad dyamic multiple issue Examples Speculate o brach outcome Roll back if path take is differet Speculate o load Roll back if locatio is updated Chapter 4 The Processor 134

68 Compiler/Hardware Speculatio Compiler ca reorder istructios e.g., move load before brach Ca iclude fix-up istructios to recover from icorrect guess Hardware ca look ahead for istructios to execute Buffer results util it determies they are actually eeded Flush buffers o icorrect speculatio Chapter 4 The Processor 135 Speculatio ad Exceptios What if exceptio occurs o a speculatively executed istructio? e.g., speculative load before ull-poiter check Static speculatio Ca add ISA support for deferrig exceptios Dyamic speculatio Ca buffer exceptios util istructio completio (which may ot occur) Chapter 4 The Processor 136

69 Static Multiple Issue Compiler groups istructios ito issue packets Group of istructios that ca be issued o a sigle cycle Determied by pipelie resources required Thik of a issue packet as a very log istructio Specifies multiple cocurret operatios Þ Very Log Istructio Word (VLIW) Chapter 4 The Processor 137 Schedulig Static Multiple Issue Compiler must remove some/all hazards Reorder istructios ito issue packets No depedecies with a packet Possibly some depedecies betwee packets Varies betwee ISAs; compiler must kow! Pad with op if ecessary Chapter 4 The Processor 138

70 MIPS with Static Dual Issue Two-issue packets Oe ALU/brach istructio Oe load/store istructio 64-bit aliged ALU/brach, the load/store Pad a uused istructio with op Address Istructio type Pipelie Stages ALU/brach IF ID EX MEM WB + 4 Load/store IF ID EX MEM WB + 8 ALU/brach IF ID EX MEM WB + 12 Load/store IF ID EX MEM WB + 16 ALU/brach IF ID EX MEM WB + 20 Load/store IF ID EX MEM WB Chapter 4 The Processor 139 MIPS with Static Dual Issue Chapter 4 The Processor 140

71 Hazards i the Dual-Issue MIPS More istructios executig i parallel EX data hazard Forwardig avoided stalls with sigle-issue Now ca t use ALU result i load/store i same packet add $t0, $s0, $s1 load $s2, 0($t0) Split ito two packets, effectively a stall Load-use hazard Still oe cycle use latecy, but ow two istructios More aggressive schedulig required Chapter 4 The Processor 141 Schedulig Example Schedule this for dual-issue MIPS Loop: lw $t0, 0($s1) # $t0=array elemet addu $t0, $t0, $s2 # add scalar i $s2 sw $t0, 0($s1) # store result addi $s1, $s1, 4 # decremet poiter be $s1, $zero, Loop # brach $s1!=0 ALU/brach Load/store cycle Loop: op lw $t0, 0($s1) 1 addi $s1, $s1, 4 op 2 addu $t0, $t0, $s2 op 3 be $s1, $zero, Loop sw $t0, 4($s1) 4 IPC = 5/4 = 1.25 (c.f. peak IPC = 2) Chapter 4 The Processor 142

72 Loop Urollig Replicate loop body to expose more parallelism Reduces loop-cotrol overhead Use differet registers per replicatio Called register reamig Avoid loop-carried ati-depedecies Store followed by a load of the same register Aka ame depedece Reuse of a register ame Chapter 4 The Processor 143 Loop Urollig Example ALU/brach Load/store cycle Loop: addi $s1, $s1, 16 lw $t0, 0($s1) 1 op lw $t1, 12($s1) 2 addu $t0, $t0, $s2 lw $t2, 8($s1) 3 addu $t1, $t1, $s2 lw $t3, 4($s1) 4 addu $t2, $t2, $s2 sw $t0, 16($s1) 5 addu $t3, $t4, $s2 sw $t1, 12($s1) 6 op sw $t2, 8($s1) 7 be $s1, $zero, Loop sw $t3, 4($s1) 8 IPC = 14/8 = 1.75 Closer to 2, but at cost of registers ad code size Chapter 4 The Processor 144

73 Dyamic Multiple Issue Superscalar processors CPU decides whether to issue 0, 1, 2, each cycle Avoidig structural ad data hazards Avoids the eed for compiler schedulig Though it may still help Code sematics esured by the CPU Chapter 4 The Processor 145 Dyamic Pipelie Schedulig Allow the CPU to execute istructios out of order to avoid stalls But commit result to registers i order Example lw $t0, 20($s2) addu $t1, $t0, $t2 sub $s4, $s4, $t3 slti $t5, $s4, 20 Ca start sub while addu is waitig for lw Chapter 4 The Processor 146

74 Dyamically Scheduled CPU Preserves depedecies Hold pedig operads Results also set to ay waitig reservatio statios Reorders buffer for register writes Ca supply operads for issued istructios Chapter 4 The Processor 147 Register Reamig Reservatio statios ad reorder buffer effectively provide register reamig O istructio issue to reservatio statio If operad is available i register file or reorder buffer Copied to reservatio statio No loger required i the register; ca be overwritte If operad is ot yet available It will be provided to the reservatio statio by a fuctio uit Register update may ot be required Chapter 4 The Processor 148

75 Speculatio Predict brach ad cotiue issuig Do t commit util brach outcome determied Load speculatio Avoid load ad cache miss delay Predict the effective address Predict loaded value Load before completig outstadig stores Bypass stored values to load uit Do t commit load util speculatio cleared Chapter 4 The Processor 149 Why Do Dyamic Schedulig? Why ot just let the compiler schedule code? Not all stalls are predicable e.g., cache misses Ca t always schedule aroud braches Brach outcome is dyamically determied Differet implemetatios of a ISA have differet latecies ad hazards Chapter 4 The Processor 150

76 Does Multiple Issue Work? The BIG Picture Yes, but ot as much as we d like Programs have real depedecies that limit ILP Some depedecies are hard to elimiate e.g., poiter aliasig Some parallelism is hard to expose Limited widow size durig istructio issue Memory delays ad limited badwidth Hard to keep pipelies full Speculatio ca help if doe well Chapter 4 The Processor 151 Power Efficiecy Complexity of dyamic schedulig ad speculatios requires power Multiple simpler cores may be better Microprocessor Year Clock Rate Pipelie Stages Issue width Out-of-order/ Speculatio Cores i MHz 5 1 No 1 5W Power Petium MHz 5 2 No 1 10W Petium Pro MHz 10 3 Yes 1 29W P4 Willamette MHz 22 3 Yes 1 75W P4 Prescott MHz 31 3 Yes 1 103W Core MHz 14 4 Yes 2 75W UltraSparc III MHz 14 4 No 1 90W UltraSparc T MHz 6 1 No 8 70W Chapter 4 The Processor 152

77 Cortex A8 ad Itel i7 Processor ARM A8 Itel Core i7 920 Market Persoal Mobile Device Server, cloud Thermal desig power 2 Watts 130 Watts Clock rate 1 GHz 2.66 GHz Cores/Chip 1 4 Floatig poit? No Yes Multiple issue? Dyamic Dyamic Peak istructios/clock cycle 2 4 Pipelie stages Pipelie schedule Static i-order Dyamic out-of-order with speculatio Brach predictio 2-level 2-level 1 st level caches/core 32 KiB I, 32 KiB D 32 KiB I, 32 KiB D 2 d level caches/core KiB 256 KiB 3 rd level caches (shared) MB Chapter 4 The Processor Real Stuff: The ARM Cortex-A8 ad Itel Core i7 Pipelies ARM Cortex-A8 Pipelie Chapter 4 The Processor 154

78 ARM Cortex-A8 Performace Chapter 4 The Processor 155 Core i7 Pipelie Chapter 4 The Processor 156

79 Core i7 Performace Chapter 4 The Processor 157 Matrix Multiply Urolled C code 1 #iclude <x86itri.h> 2 #defie UNROLL (4) 3 4 void dgemm (it, double* A, double* B, double* C) 5 { 6 for ( it i = 0; i < ; i+=unroll*4 ) 7 for ( it j = 0; j < ; j++ ) { 8 m256d c[4]; 9 for ( it x = 0; x < UNROLL; x++ ) 10 c[x] = _mm256_load_pd(c+i+x*4+j*); for( it k = 0; k < ; k++ ) 13 { 14 m256d b = _mm256_broadcast_sd(b+k+j*); 15 for (it x = 0; x < UNROLL; x++) 16 c[x] = _mm256_add_pd(c[x], 17 _mm256_mul_pd(_mm256_load_pd(a+*k+x*4+i), b)); 18 } for ( it x = 0; x < UNROLL; x++ ) 21 _mm256_store_pd(c+i+x*4+j*, c[x]); 22 } 23 } 4.12 Istructio-Level Parallelism ad Matrix Multiply Chapter 4 The Processor 158

80 Matrix Multiply Assembly code: 1 vmovapd (%r11),%ymm4 # Load 4 elemets of C ito %ymm4 2 mov %rbx,%rax # register %rax = %rbx 3 xor %ecx,%ecx # register %ecx = 0 4 vmovapd 0x20(%r11),%ymm3 # Load 4 elemets of C ito %ymm3 5 vmovapd 0x40(%r11),%ymm2 # Load 4 elemets of C ito %ymm2 6 vmovapd 0x60(%r11),%ymm1 # Load 4 elemets of C ito %ymm1 7 vbroadcastsd (%rcx,%r9,1),%ymm0 # Make 4 copies of B elemet 8 add $0x8,%rcx # register %rcx = %rcx vmulpd (%rax),%ymm0,%ymm5 # Parallel mul %ymm1,4 A elemets 10 vaddpd %ymm5,%ymm4,%ymm4 # Parallel add %ymm5, %ymm4 11 vmulpd 0x20(%rax),%ymm0,%ymm5 # Parallel mul %ymm1,4 A elemets 12 vaddpd %ymm5,%ymm3,%ymm3 # Parallel add %ymm5, %ymm3 13 vmulpd 0x40(%rax),%ymm0,%ymm5 # Parallel mul %ymm1,4 A elemets 14 vmulpd 0x60(%rax),%ymm0,%ymm0 # Parallel mul %ymm1,4 A elemets 15 add %r8,%rax # register %rax = %rax + %r8 16 cmp %r10,%rcx # compare %r8 to %rax 17 vaddpd %ymm5,%ymm2,%ymm2 # Parallel add %ymm5, %ymm2 18 vaddpd %ymm0,%ymm1,%ymm1 # Parallel add %ymm0, %ymm1 19 je 68 <dgemm+0x68> # jump if ot %r8!= %rax 20 add $0x1,%esi # register % esi = % esi vmovapd %ymm4,(%r11) # Store %ymm4 ito 4 C elemets 22 vmovapd %ymm3,0x20(%r11) # Store %ymm3 ito 4 C elemets 23 vmovapd %ymm2,0x40(%r11) # Store %ymm2 ito 4 C elemets 24 vmovapd %ymm1,0x60(%r11) # Store %ymm1 ito 4 C elemets 4.12 Istructio-Level Parallelism ad Matrix Multiply Chapter 4 The Processor 159 Performace Impact Chapter 4 The Processor 160

81 Fallacies Pipeliig is easy (!) The basic idea is easy The devil is i the details e.g., detectig data hazards Pipeliig is idepedet of techology So why have t we always doe pipeliig? More trasistors make more advaced techiques feasible Pipelie-related ISA desig eeds to take accout of techology treds e.g., predicated istructios 4.14 Fallacies ad Pitfalls Chapter 4 The Processor 161 Pitfalls Poor ISA desig ca make pipeliig harder e.g., complex istructio sets (VAX, IA-32) Sigificat overhead to make pipeliig work IA-32 micro-op approach e.g., complex addressig modes Register update side effects, memory idirectio e.g., delayed braches Advaced pipelies have log delay slots Chapter 4 The Processor 162

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 4 The Processor Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler CPI ad Cycle time Determied

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Arquitectura de Computadores

Arquitectura de Computadores Arquitectura de Computadores Capítulo 2. Procesadores segmetados Based o the origial material of the book: D.A. Patterso y J.L. Heessy Computer Orgaizatio ad Desig: The Hardware/Software Iterface 4 th

More information

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 5 th Edition Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

Chapter 4. The Processor. Jiang Jiang

Chapter 4. The Processor. Jiang Jiang Chapter 4 The Processor Jiang Jiang jiangjiang@ic.sjtu.edu.cn [Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2008, MK] Chapter 4 The Processor 2 Introduction CPU performance

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (3) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27) Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Recall. ISA? Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Instruction Format or Encoding how is it decoded? Location of operands and

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 3: ISA ad Itroductio to Microarchitecture Prof. Yajig Li Uiversity of Chicago Lecture Outlie ISA uarch (hardware implemetatio of a ISA) Logic desig basics Sigle-cycle

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor? Chapter 4 The Processor 2 Introduction We will learn How the ISA determines many aspects

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors CS252 Sprig 2017 Graduate Computer Architecture Lecture 6: Out-of-Order Processors Lisa Wu, Krste Asaovic http://ist.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 2 WU UCB CS252 SP17 Last Time i Lecture

More information

Lecture 2: Processor and Pipelining 1

Lecture 2: Processor and Pipelining 1 The Simple BIG Picture! Chapter 3 Additional Slides The Processor and Pipelining CENG 6332 2 Datapath vs Control Datapath signals Control Points Controller Datapath: Storage, FU, interconnect sufficient

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University The Processor: Datapath and Control Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction CPU performance factors Instruction count Determined

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

DEE 1053 Computer Organization Lecture 6: Pipelining

DEE 1053 Computer Organization Lecture 6: Pipelining Dept. Electronics Engineering, National Chiao Tung University DEE 1053 Computer Organization Lecture 6: Pipelining Dr. Tian-Sheuan Chang tschang@twins.ee.nctu.edu.tw Dept. Electronics Engineering National

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Chapter 4. The Processor Designing the datapath

Chapter 4. The Processor Designing the datapath Chapter 4 The Processor Designing the datapath Introduction CPU performance determined by Instruction Count Clock Cycles per Instruction (CPI) and Cycle time Determined by Instruction Set Architecure (ISA)

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device

More information

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines This Uit: Damic Schedulig CSE 560 Computer Sstems Architecture Damic Schedulig Slides origiall developed b Drew Hilto (IBM) ad Milo Marti (Uiversit of Peslvaia) App App App Sstem software Mem CPU I/O Code

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (1) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Data Hazards in a Pipelined Datapath James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Data

More information

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont ) Chapter 2 Computer Abstractions and Technology Lesson 4: MIPS (cont ) Logical Operations Instructions for bitwise manipulation Operation C Java MIPS Shift left >>> srl Bitwise

More information

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes Chapter 2 Instructions: Language of the Computer Adapted by Paulo Lopes Instruction Set The repertoire of instructions of a computer Different computers have different instruction sets But with many aspects

More information

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition. Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive

More information

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations Chapter 4 The Processor Part I Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S. Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online

More information

Chapter 2A Instructions: Language of the Computer

Chapter 2A Instructions: Language of the Computer Chapter 2A Instructions: Language of the Computer Copyright 2009 Elsevier, Inc. All rights reserved. Instruction Set The repertoire of instructions of a computer Different computers have different instruction

More information

Chapter 4 The Processor 1. Chapter 4D. The Processor

Chapter 4 The Processor 1. Chapter 4D. The Processor Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline

More information

Chapter 4 The Datapath

Chapter 4 The Datapath The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

CENG 3420 Lecture 06: Pipeline

CENG 3420 Lecture 06: Pipeline CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2019 Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture. zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture. We have already discussed in the previous module that true

More information

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015 Branch Addressing Branch instructions specify Opcode, two registers, target address Most branch targets are near branch Forward or backward op rs rt constant or address 6 bits 5 bits 5 bits 16 bits PC-relative

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05,

More information

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions Representing Instructions Instructions are encoded in binary Called machine code MIPS instructions Encoded as 32-bit instruction words Small number of formats encoding operation code (opcode), register

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information