Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts
|
|
- Jesse Godfrey McCoy
- 6 years ago
- Views:
Transcription
1 CS359: Compter Architectre Chapter 3 & Appendi C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Compter Science and Engineering Shanghai Jiao Tong University 1
2 Otline Introdction to Pipelining How Pipeline is Implemented C.3 Pipeline Hazards Eceptions Handling lticycle Operations 2
3 Hazards It wold be happy if we split the datapath into stages and the CPU works jst fine Bt, things are not that simple as yo may epect There are hazards! Hazard is a sitation that prevents starting the net instrction in the net cycle Strctre hazard Conflict over the se of a resorce at the same time Data hazard Data is not ready for the sbseqent dependent instrction Control hazard Fetching the net instrction depends on the previos branch otcome 3
4 Strctre Hazards Strctral hazard is a conflict over the se of a resorce at the same time Sppose the IPS CPU with a single memory Load/store reqires data access in E stage Instrction fetch reqires instrction access from the same memory Instrction fetch wold have to stall for that cycle Wold case a pipeline bbble Hence, pipelined datapaths reqire either separate ports to memory or separate memories for instrction and data Address Bs Address Bs IPS CPU Data Bs emory IPS CPU Data Bs Address Bs emory Data Bs 4
5 Time Strctre Hazards (Cont.) lw IF ID EX E WB add IF ID EX E WB sb IF ID EX E WB add IF ID EX E WB Either provide separate ports to access memory or to provide instrction memory and data memory separately 5
6 Data Hazards Data is not ready for the sbseqent dependent instrction add $s0,$t0,$t1 IF ID EX E WB sb $t2,$s0,$t3 Bbble IF ID Bbble EX E WB To solve the data hazard problem, the pipeline needs to be stalled (typically referred to as bbble ) Then, the performance is penalized A better soltion? Forwarding (or Bypassing) 6
7 Forwarding add $s0,$t0,$t1 IF ID EX E WB sb $t2,$s0,$t3 IF ID Bbble Bbble EX E WB 7
8 Data Hazard - Load-Use Case Can t always avoid stalls by forwarding Can t forward backward in time! Hardware interlock is needed for the pipeline stall lw $s0, 8($t1) IF ID EX E WB sb $t2,$s0,$t3 IF ID Bbble EX E WB This bbble can be hidden by proper instrction schedling 8
9 Code Schedling to Avoid Stalls Reorder code to avoid se of load reslt in the net instrction A = B + E; // B is loaded to $t1, E is loaded to $t2 C = B + F; // F is loaded to $t4 stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles 9
10 Data Hazard - Forwarding Don t wait for them to be written to the register file Use temporary reslts Vale of register $2 : Vale of EX/E : Vale of E/WB : Program eection order (in instrctions) sb $2, $1, $3 Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 I Reg CC 7 CC 8 CC / X X X 20 X X X X X X X X X 20 X X X X D Reg Ok.. Then, do we have to do this forwarding? 1. If the write to the register file occrs in the first half of the clock, and read occrs in the 2 nd half of the clock, then? and $12, $2, $5 I Reg D Reg or $13, $6, $2 I Reg D Reg add $14, $2, $2 I Reg D Reg sw $15, 100($2) I Reg D Reg 10
11 Forwarding ID/EX EX/E E/WB Register File ALU Data emory UX 11
12 Forwarding (from EX/E) ID/EX EX/E E/WB UX Register File ALU UX Data emory UX 12
13 Forwarding (from E/WB) ID/EX EX/E E/WB UX Register File ALU UX Data emory UX 13
14 Forwarding (operand selection) ID/EX EX/E E/WB UX Register File ALU UX Data emory UX Forwarding Unit 14
15 Forwarding (operand propagation) ID/EX EX/E E/WB UX Register File ALU UX Data emory UX Rd Rt UX Rt Rs Forwarding Unit EX/E Rd E/WB Rd 15
16 Review: The IPS Instrction Formats All IPS instrctions are 32 bits long. The three instrction formats: R-type op rs rt rd shamt fnct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-type op rs rt immediate 6 bits 5 bits 5 bits 16 bits J-type op target address 6 bits 26 bits The different fields are: op: operation of the instrction rs, rt, rd: the sorce and destination register specifiers shamt: shift amont fnct: selects the variant of the operation in the op field address / immediate: address offset or immediate vale target address: target address of the jmp instrction EI209 Chapter 4A.16 CSE, SJTU, 2013
17 Forwarding Logic Implementation 17
18 Forwarding ID/EX WB EX/E Control WB E/WB IF/ID EX WB PC Instrction memory Instrction Registers ALU Data memory IF/ID.RegisterRs Rs IF/ID.RegisterRt Rt IF/ID.RegisterRt IF/ID.RegisterRd Rt Rd EX/E.RegisterRd Forwarding nit E/WB.RegisterRd 18
19 Can't always forward lw (load word) can still case a hazard An instrction tries to read a register following a load instrction that writes to the same register Time (in clock cycles) Program CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 eection order (in instrctions) lw $2, 20($1) I Reg D Reg CC 7 CC 8 CC 9 and $4, $2, $5 I Reg D Reg or $8, $2, $6 I Reg D Reg add $9, $4, $2 I Reg D Reg slt $1, $6, $7 I Reg D Reg Ths, we need a hazard detection nit to stall the pipeline after the load instrction 19
20 Stalling We can stall the pipeline by keeping an instrction in the same stage Program Time (in clock cycles) eection order (in instrctions) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 10 lw $2, 20($1) I Reg D Reg and $4, $2, $5 I ID - Reg Reg D Reg or $8, $2, $6 - I IF I Reg D Reg bbble add $9, $4, $2 I Reg D Reg slt $1, $6, $7 I Reg D Reg 20
21 Hazard Detection Unit Stall the pipeline if both ID/EX is a load and (rt=if/id.rs or rt=if/id.rt) Stall by letting an instrction (that won t write anything) go forward Hazard detection nit ID/EX.emRead ID/EX IF/IDWrite IF/ID Control 0 WB EX EX/E WB E/WB WB PCWrite PC Instrction memory Instrction Registers ALU Data memory IF/ID.RegisterRs IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRd ID/EX.RegisterRt Rt Rd Rs Rt Forwarding nit EX/E.RegisterRd E/WB.RegisterRd 21
22 Data Hazard Detection Logic The logic to detect the need for load interlocks dring the ID stage of an instrction 22
23 Control Hazard Branch determines the flow of instrctions Fetching the net instrction depends on the branch otcome Pipeline can t always fetch correct instrction Branch instrction is still working on ID stage when fetching the net instrction Taken target address is known here Branch is resolved here beq $1,$2,L1 IF ID EX E WB add $1,$2,$3 Bbble IF ID EX E WB sw $1, 4($2) Bbble IF ID EX E WB L1: sb $1,$2, $3 IF ID EX E WB Fetch the net instrction based on the comparison reslt 23
24 Redcing Control Hazard To redce 2 bbbles to 1 bbble, add hardware in ID stage to compare registers (and generate branch condition) Taken target address is known here Branch is resolved here beq $1,$2,L1 IF ID EX E WB add $1,$2,$3 L1: sb $1,$2, $3 Bbble IF ID EX E WB IF ID EX E WB Fetch instrction based on the comparison reslt 24
25 Delayed Branch any CPUs adopt a techniqe called the delayed branch to frther redce the stall Delayed branch always eectes the net seqential instrction The branch takes place after that one instrction delay Delay slot is the slot right after a delayed branch instrction Taken target address is known here Branch is resolved here beq $1,$2,L1 IF ID EX E WB add $1,$2,$3 (delay slot) IF ID EX E WB L1: sb $1,$2, $3 IF ID EX E WB Fetch instrction based on the comparison reslt 25
26 Delay Slot (Cont.) Compiler needs to schedle a sefl instrction in the delay slot, or fills it p with nop (no operation) // $s1 = a, $s2 = b, $3 = c // $t0 = d, $t1 = f a = b + c; if (d == 0) {f = f + 1;} f = f + 2; add $s1,$s2, $s3 bne $t0,$zero, L1 nop //delay slot addi $t1, $t1, 1 L1: addi $t1, $t1, 2 Can we do better? bne $t0, $zero, L1 add $s1,$s2,$s3 // delay slot addi $t1, $t1, 1 L1: addi $t1, $t1, 2 Fill the delay slot with a sefl and valid instrction 26
27 Branch Prediction Longer pipelines (implemented in Core 2 Do, for eample) can t readily determine branch otcome early Stall penalty becomes nacceptable since branch instrctions are sed so freqently in the program Soltion: Branch Prediction Predict the branch otcome in hardware Flsh the instrctions (that sholdn t have been eected) in the pipeline if the prediction trns ot to be wrong odern processors se sophisticated branch predictors 27
28 IPS with Predict-Not-Taken Prediction correct Prediction incorrect Flsh the instrction that sholdn t be eected 28
29 Control Hazards - Branch When the branch condition is resolved, other instrctions are in the pipeline Program eection order (in instrctions) Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 40 beq $1, $3, 7 44 and $12, $2, $5 I Reg I Reg D Reg D Reg Note that in this implementation, the branch is resolved in the E stage 48 or $13, $6, $2 I Reg D Reg 52 add $14, $2, $2 I Reg D Reg 72 lw $4, 50($7) I Reg D Reg We are predicting branch not taken If we are wrong (if branch is taken), flsh instrctions 29
30 Alleviate Branch Hazards Redce penalty to 1 cycle ove the branch compare to the ID stage of pipeline Add an adder to calclate the branch target in ID stage Add the IF.flsh signal that zeros the instrction (or sqash) in IF/ID pipeline register Taken target address is known here Branch is resolved here beq $1,$2,L1 IF ID EX E WB add $1,$2,$3 L1: sb $1,$2, $3 Bbble IF ID EX E WB e IF ID EX E WB 30
31 31 Chapter 3: Pipelining Flshing Instrctions PC Instrction memory 4 Registers ALU EX WB WB WB ID/EX 0 EX/E E/WB Data memory Hazard detection nit Forwarding nit IF.Flsh IF/ID Sign etend Control = Shift left 2
32 Flshing Instrctions (cycle N) and $12, $2, $5 IF.Flsh beq $1, $3, L2 Hazard detection nit Control 0 ID/EX WB EX/E WB E/WB beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 L2: lw $4, 40($7) IF/ID EX WB PC 4 Instrction memory Shift left 2 Registers = ALU Data memory Sign etend Forwarding nit 32
33 Flshing Instrctions (cycle N) and $12, $2, $5 IF.Flsh beq $1, $3, L2 Hazard detection nit Control 0 ID/EX WB EX/E WB E/WB beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 L2: lw $4, 40($7) IF/ID EX WB L2 PC 4 Instrction memory Shift left 2 Registers = ALU Data memory Sign etend Forwarding nit 33
34 Flshing Instrctions (cycle N+1) lw $4, 40($7) IF.Flsh Hazard detection nit Control nop 0 ID/EX WB beq $1, $3, L2 EX/E WB E/WB beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 L2: lw $4, 40($7) IF/ID EX WB PC 4 Instrction memory Shift left 2 Registers = ALU Data memory Sign etend Forwarding nit 34
35 Otline Introdction to Pipelining How Pipeline is Implemented Pipeline Hazards C.4 Eceptions Handling lticycle Operations 35
36 Eceptions Eceptions describe those sitations where the normal eection order of instrction is changed! may force the CPU to abort the instrctions in the pipeline before they complete! Some other sed terminologies for eception Interrpt Falt 36
37 Types of Eceptions I/O device reqest Invoking an OS service for a ser program Tracing instrction eection Breakpoint (programmer reqested interrpt) Integer arithmetic overflow FP arithmetic anomaly Page falt (not in main memory) isaligned memory accesses emory protection violation Using an ndefined instrction Hardware malfnctions Power failre 37
38 Reqirements on Eceptions Synchronos vs asynchronos 38
39 Classifications 39
40 Stopping and Restarting Eection The most difficlt eceptions have two properties (1) they occr within instrctions (i.e., in the middle of the instrction eection corresponding to EX or E pipe stages (2) they mst be restartable 40
41 Steps to Save Pipeline State (1) Force a trap instrction into the pipeline on the net IF (2) Until the trap is taken, trn off all writes for the falting instrction and for all instrctions that follow in the pipeline This can be done by placing zeros into the pipeline latches of all instrctions, starting with the instrction that generates the eception, bt not those that precede that instrction (3) After the eception-handling rotine in the OS receives control, it immediately saves the PC of the falting instrction This vale will be sed to retrn from the eception later 41
42 Precise vs Imprecise Eceptions If the pipeline can be stopped so that the instrctions jst before the falting instrction are completed and those after it can be restarted from scratch, the pipeline is staid to have precise eceptions Spporting precise eceptions is a reqirement in many systems Any processor with demand paging or IEEE arithmetic trap handlers mst make its eceptions precise 42
43 Eceptions in IPS Pipeline Eceptions may occr in different stages of a pipeline 43
44 Otline Introdction to Pipelining How Pipeline is Implemented Pipeline Hazards Eceptions C.5 Handling lticycle Operations 44
45 Spporting ltiple FP Operations E X 4 IF ID E WB A 1 A 2 A 3 Integer Unit FP mltiplier: 7 cycles 5 FP add: 4 cycles A FP divider (non-pipelined) 24 cycles Complicate bypass or forwarding Potential strctral hazard ltiple (FP) instrctions can complete at the same time RF might need to be mlti-ported Ordering isse, who gets to pdate the register? Ot-of-order completion/retirement: Precise eception isse odified from Prof Sean Lee s Slide 45
46 Bypassing & Forwarding Clock Cycles L.D F4,0(R2) IF ID EX WB UL.D F0,F4,F6 IF ID S WB ADD.D F2,F0,F8 IF S ID S S S S S S A1 A2 A3 A4 WB S.D F2,0(R2) IF S S S S S S ID EX S S S WB 46
47 Strctral Hazards Clock Cycles UL.D F0,F4,F6 IF ID WB.... IF ID EX WB.... IF ID EX WB ADD.D F2,F4,F6 IF ID A1 A2 A3 A4 WB L.D F2,0(R2) IF ID EX WB IF ID EX WB IF ID EX WB Write to register file at the same cycle (cc11) Write to the same register (WAW) E in cc10 47
48 Precise Eception Isse DIV.D F0,F2,F4 ADD.D F3,F10,F8 SUB.D F12,F12,F14 (eception!) (completed) (completed) Precise eception: If the pipeline can (or mst) be stopped All the instrctions before the falty (or intended) instrction mst be completed All the instrctions after it mst not be completed Restart the eection from the falty (or intended) instrction State mst be consistent with the original program order Not straightforward with ot-of-order completion 48
49 Scalar Pipeline (Baseline) Instrction Seqence IF DE EX E WB Eection Cycle odified from Prof Sean Lee s Slide 49
50 Sperpipeline Deeper pipelining is called sperpipelining Deeper pipeline allows for achieving higher clock rates Instrction Seqence 1 2 IF DE EX E WB I I I D D D E E E W W W E E E E E D E E D D E D D D I D D I I D I I I Eection Cycle odified from Prof Sean Lee s Slide 50
51 CS359: Compter Architectre End of Part A Qestions? 51
TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction
Review Friday the 2st of October Real world eamples of pipelining? How does pipelining pp inflence instrction latency? How does pipelining inflence instrction throghpt? What are the three types of hazard
More informationOverview of Pipelining
EEC 58 Compter Architectre Pipelining Department of Electrical Engineering and Compter Science Cleveland State University Fndamental Principles Overview of Pipelining Pipelined Design otivation: Increase
More informationPS Midterm 2. Pipelining
PS idterm 2 Pipelining Seqential Landry 6 P 7 8 9 idnight Time T a s k O r d e r A B C D 3 4 2 3 4 2 3 4 2 3 4 2 Seqential landry takes 6 hors for 4 loads If they learned pipelining, how long wold landry
More informationWhat do we have so far? Multi-Cycle Datapath
What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2 Pipelining pipelining
More information1048: Computer Organization
8: Compter Organization Lectre 6 Pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6- Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards
More informationReview: Computer Organization
Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes
More informationComp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13
Comp 33 Compter Architectre A Pipelined path Lectre 3 Pipelined path with Signals PCSrc IF/ ID ID/ EX EX / E E / Add PC 4 Address Instrction emory RegWr ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign
More informationEnhanced Performance with Pipelining
Chapter 6 Enhanced Performance with Pipelining Note: The slides being presented represent a mi. Some are created by ark Franklin, Washington University in St. Lois, Dept. of CSE. any are taken from the
More informationCS 251, Winter 2019, Assignment % of course mark
CS 25, Winter 29, Assignment.. 3% of corse mark De Wednesday, arch 3th, 5:3P Lates accepted ntil Thrsday arch th, pm with a 5% penalty. (7 points) In the diagram below, the mlticycle compter from the corse
More informationCS 251, Winter 2018, Assignment % of course mark
CS 25, Winter 28, Assignment 4.. 3% of corse mark De Wednesday, arch 7th, 4:3P Lates accepted ntil Thrsday arch 8th, am with a 5% penalty. (6 points) In the diagram below, the mlticycle compter from the
More informationEEC 483 Computer Organization. Branch (Control) Hazards
EEC 483 Compter Organization Section 4.8 Branch Hazards Section 4.9 Exceptions Chans Y Branch (Control) Hazards While execting a previos branch, next instrction address might not yet be known. s n i o
More informationEXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation
EXAINATIONS 2003 COP203 END-YEAR Compter Organisation Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. There are 180 possible marks on the eam. Calclators and foreign langage dictionaries
More informationPipelining. Chapter 4
Pipelining Chapter 4 ake processor rns faster Pipelining is an implementation techniqe in which mltiple instrctions are overlapped in eection Key of making processor fast Pipelining Single cycle path we
More informationExceptions and interrupts
Eceptions and interrpts An eception or interrpt is an nepected event that reqires the CPU to pase or stop the crrent program. Eception handling is the hardware analog of error handling in software. Classes
More informationChapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind
Pipelining hink of sing machines in landry services Chapter 6 nhancing Performance with Pipelining 6 P 7 8 9 A ime ask A B C ot pipelined Assme 3 min. each task wash, dry, fold, store and that separate
More informationEXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION
EXAINATIONS 2010 END OF YEAR COPUTER ORGANIZATION Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. ake sre yor answers are clear and to the point. Calclators and paper foreign langage
More informationEEC 483 Computer Organization
EEC 83 Compter Organization Chapter.6 A Pipelined path Chans Y Pipelined Approach 2 - Cycle time, No. stages - Resorce conflict E E A B C D 3 E E 5 E 2 3 5 2 6 7 8 9 c.y9@csohio.ed Resorces sed in 5 Stages
More informationChapter 6: Pipelining
Chapter 6: Pipelining Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining
More information1048: Computer Organization
48: Compter Organization Lectre 5 Datapath and Control Lectre5A - simple implementation (cwli@twins.ee.nct.ed.tw) 5A- Introdction In this lectre, we will try to implement simplified IPS which contain emory
More informationSolutions for Chapter 6 Exercises
Soltions for Chapter 6 Eercises Soltions for Chapter 6 Eercises 6. 6.2 a. Shortening the ALU operation will not affect the speedp obtained from pipelining. It wold not affect the clock cycle. b. If the
More informationThe extra single-cycle adders
lticycle Datapath As an added bons, we can eliminate some of the etra hardware from the single-cycle path. We will restrict orselves to sing each fnctional nit once per cycle, jst like before. Bt since
More informationEEC 483 Computer Organization
EEC 483 Compter Organization Chapter 4.4 A Simple Implementation Scheme Chans Y The Big Pictre The Five Classic Components of a Compter Processor Control emory Inpt path Otpt path & Control 2 path and
More informationLecture 13: Exceptions and Interrupts
18 447 Lectre 13: Eceptions and Interrpts S 10 L13 1 James C. Hoe Dept of ECE, CU arch 1, 2010 Annoncements: Handots: Spring break is almost here Check grades on Blackboard idterm 1 graded Handot #9: Lab
More informationReview. A single-cycle MIPS processor
Review If three instrctions have opcodes, 7 and 5 are they all of the same type? If we were to add an instrction to IPS of the form OD $t, $t2, $t3, which performs $t = $t2 OD $t3, what wold be its opcode?
More informationChapter 6: Pipelining
CSE 322 COPUTER ARCHITECTURE II Chapter 6: Pipelining Chapter 6: Pipelining Febrary 10, 2000 1 Clothes Washing CSE 322 COPUTER ARCHITECTURE II The Assembly Line Accmlate dirty clothes in hamper Place in
More informationThe single-cycle design from last time
lticycle path Last time we saw a single-cycle path and control nit for or simple IPS-based instrction set. A mlticycle processor fies some shortcomings in the single-cycle CPU. Faster instrctions are not
More informationProf. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:
EE8 Winter 25 Homework #2 Soltions De Thrsday, Feb 2, 5 P. ( points) Consider the following fragment of Java code: for (i=; i
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online
More informationComputer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University
Compter Architectre Chapter 5 Fall 25 Department of Compter Science Kent State University The Processor: Datapath & Control Or implementation of the MIPS is simplified memory-reference instrctions: lw,
More informationInstruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion
. (Chapter 5) Fill in the vales for SrcA, SrcB, IorD, Dst and emto to complete the Finite State achine for the mlti-cycle datapath shown below. emory address comptation 2 SrcA = SrcB = Op = fetch em SrcA
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationReview Multicycle: What is Happening. Controlling The Multicycle Design
Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em
More information1048: Computer Organization
48: Compter Organization Lectre 5 Datapath and Control Lectre5B - mlticycle implementation (cwli@twins.ee.nct.ed.tw) 5B- Recap: A Single-Cycle Processor PCSrc 4 Add Shift left 2 Add ALU reslt PC address
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationQuiz #1 EEC 483, Spring 2019
Qiz # EEC 483, Spring 29 Date: Jan 22 Name: Eercise #: Translate the following instrction in C into IPS code. Eercise #2: Translate the following instrction in C into IPS code. Hint: operand C is stored
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationThe Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
The Processor (3) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationImprove performance by increasing instruction throughput
Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access
More informationThe final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.
The final path PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtor RegDst ALUSrc em I [5
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationInstruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.
Pipelining Pipelining is the se of pipelining to allow more than one instrction to be in some stage of eection at the same time. Ferranti ATLAS (963): Pipelining redced the average time per instrction
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationCSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade
CSE 141 Compter Architectre Smmer Session I, 2004 Lectres 10 Advanced Topics, emory Hierarchy and Cache Pramod V. Argade CSE141: Introdction to Compter Architectre Instrctor: TA: Pramod V. Argade (p2argade@cs.csd.ed)
More informationChapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns
Chapter Si Pipelining Improve perfomance by increasing instruction throughput eecutionı Time lw $, ($) 2 6 8 2 6 8 access lw $2, 2($) 8 ns access lw $3, 3($) eecutionı Time lw $, ($) lw $2, 2($) 2 ns 8
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationProcessor (II) - pipelining. Hwansoo Han
Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number
More informationPIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons
Pipelining: Natral Phenomenon Landry Eample: nn, rian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 mintes C D Dryer takes 0 mintes PIPELINING Folder takes 20 mintes
More informationThe multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM
Lectre (Wed /5/28) Lab # Hardware De Fri Oct 7 HW #2 IPS programming, de Wed Oct 22 idterm Fri Oct 2 IorD The mlticycle path SrcA Today s objectives: icroprogramming Etending the mlti-cycle path lti-cycle
More informationCS 251, Spring 2018, Assignment 3.0 3% of course mark
CS 25, Spring 28, Assignment 3. 3% of corse mark De onday, Jne 25th, 5:3 P. (5 points) Consider the single-cycle compter shown on page 6 of this assignment. Sppose the circit elements take the following
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationCS 251, Winter 2018, Assignment % of course mark
CS 25, Winter 28, Assignment 3.. 3% of corse mark De onday, Febrary 26th, 4:3 P Lates accepted ntil : A, Febrary 27th with a 5% penalty. IEEE 754 Floating Point ( points): (a) (4 points) Complete the following
More informationComputer Architecture. Lecture 6: Pipelining
Compter Architectre Lectre 6: Pipelining Dr. Ahmed Sallam Based on original slides by Prof. Onr tl Agenda for Today & Net Few Lectres Single-cycle icroarchitectres lti-cycle and icroprogrammed icroarchitectres
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected
More informationLecture 7. Building A Simple Processor
Lectre 7 Bilding A Simple Processor Christos Kozyrakis Stanford University http://eeclass.stanford.ed/ee8b C. Kozyrakis EE8b Lectre 7 Annoncements Upcoming deadlines Lab is de today Demo by 5pm, report
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationChapter 4 (Part II) Sequential Laundry
Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationComplications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?
Complications with long instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study So far, all MIPS instructions take 5 cycles But haven't talked
More informationCMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions
CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study Complications With Long Instructions So far, all MIPS instructions take 5 cycles But haven't talked
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationPART I: Adding Instructions to the Datapath. (2 nd Edition):
EE57 Instrctor: G. Pvvada ===================================================================== Homework #5b De: check on the blackboard =====================================================================
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More information4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1
.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationComputer Architecture
Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics (Lectres
More informationComputer Architecture
Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Spring 25 Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationECE260: Fundamentals of Computer Engineering
Data Hazards in a Pipelined Datapath James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Data
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationHardware Design Tips. Outline
Hardware Design Tips EE 36 University of Hawaii EE 36 Fall 23 University of Hawaii Otline Verilog: some sbleties Simlators Test Benching Implementing the IPS Actally a simplified 6 bit version EE 36 Fall
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationSI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,
SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Chapter 6 ADMIN ing for Chapter 6: 6., 6.9-6.2 2 Midnight Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 3 Smarty
More informationLECTURE 9. Pipeline Hazards
LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we finalized the pipelined datapath for instruction sequences which do not include hazards of any kind. Remember that
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More informationChapter 4. The Processor
Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA
More informationEE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes
NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More informationLecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)
Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationWinter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.
page of 7 University of Calgary Departent of Electrical and Copter Engineering ENCM 369: Copter Organization Lectre Instrctors: Steve Noran and Nor Bartley Winter 23 MIDTERM TEST #2 Wednesday, March 2
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationLecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve
More informationECE/CS 552: Pipeline Hazards
ECE/CS 552: Pipeline Hazards Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipeline Hazards Forecast Program Dependences
More informationCS 153 Design of Operating Systems
CS 153 Design of Operating Systems Spring 18 Lectre 3: OS model and Architectral Spport Instrctor: Chengy Song Slide contribtions from Nael Ab-Ghazaleh, Harsha Madhyvasta and Zhiyn Qian Last time/today
More information