Today s Menu. Multi-Cycle Exceptions Pipelining. Exceptions and Interrupts. Handling Exceptions and Interrupts. Why This Is Very Messy.
|
|
- Todd Dorsey
- 6 years ago
- Views:
Transcription
1 ulti-cycle Exceptions Today s enu Exceptions hat are they? hat do we do about them? Introduction to pipelining hy pipelining? hy is it difficult? How can we do it efficiently? Examples 1 2 Exceptions and Interrupts Exceptions are exceptional events that disrupt the normal flow of a program Terminology varies between different machines Examples of Interrupts ser hitting the keyboard Disk drive asking for attention Arrival of a network packet Examples of Exceptions Divide by zero Overflow Page fault Handling Exceptions and Interrupts hen do we jump to an exception? pon detection, invoke the O to service the event ight when it occurs? hat about in the middle of executing a multi-cycle instruction Difficult to abort the middle of an instruction Processor checks for event at the end of every instruction Processor provides E & Cause registers to inform O of cause E - Exception Counter Holds that the O should jump to when resuming execution Cause ister Holds bit-encoded cause of the exception 3 Exception Flow hy This Is Very essy hen an exception (or interrupt) occurs, control is transferred to the O hen the O is done, it jumps back to the user program (if it can) ser Process Event exception Exception return (tional) Operating ystem Exception processing by exception handler 5 You have many instructions in flight In one of these instructions, a bad thing happens, eg, divide-by-zero hat do we have to do? e have to deal with this event, since normal program execution is probably now incorrect But, we have a bunch of instructions in flight any of them, but maybe not all of them, need to get killed Don t want to kill stuff that is actually correct, and waste that work. hen do we kill them? NO -- die die die.? ait till exception-causing instruction finishes? ait till the pipeline empties? Very very very messy part of real machine design. 6
2 eview of ulticycle vs. ingle Cycle Complete ingle-cycle Datapath ingle cycle implementations have to consider the worst case delay through the path to come-up with the cycle time. ulticycle implementations have the advantage of using a different number of cycles for executing each instruction. Current emory (A) ADDE In general, the multicycle machine is better than the single cycle machine, but the actual execution time strongly depends on the workload. The most widely used machine implementation is neither single cycle, nor multicycle it s the pipelined implementation. (Next lecture) ister File 1 2 Data 1 rite Data emory (A) rite Data 7 8 Cost of the ingle Cycle Architecture ulti-cycle olution Instr Class 1 Instr Class 2 Instr Class 3 Our Cycle (longest ) Idea: Let the FATET instruction determine clock period Instr Class 1 Instr Class 2 Instr Class 3 Takes cycles Takes 2 cycles ost of the time is wasted! 9 Less asted ulti-cycle eality ulticycle Control Add Intermediate isters e are going to go further than allowing the fastest instruction to determine rate e are going to break EVEY instruction up into phases -class Load em IorD emrite Irite ister Dest rite 1 2 Data 1 A rite B ela Out Branch tore D Extend emto [5:0] hift left 2 elb Control Op 11 12
3 ulticycle Let s build cars Henry Ford, odel T, 1908 Non-pipelined: 1 car/ hours 13 1 Henry Ford, odel T, 1908 Henry Ford, odel T, 1908 Non-pipelined: 1 car/ hours Non-pipelined: 1 car/ hours Henry Ford, odel T, 1908 Henry Ford, odel T, 1908 Non-pipelined: 1 car/ hours Non-pipelined: 1 car/ hours 17 18
4 Analogy: Gasoline Transportation Henry Ford, odel T, 1908 Non-pipelined: 1 car/ hours pipelined: 1 car/hour 19 Trucking gas from depot to gas station Get the barrels Load them into the truck Drive to the gas station nload the gas eturn for more oil Let s do the math Each truck can carry 5 barrels Can load a truck with 5 barrels in 1 hour It takes each truck 1 day to drive to and from gas station Q: How many barrels per week are delivered? Q: hat if I had more trucks? GA TATION 20 Looks a Lot Like a ulticycle Processor hat are the steps Fetch an instruction (Get the barrels) Decode the instruction (Load them into the truck) OP (Drive to the gas station) emory Access (nload the gas) rite-back (eturn for more oil) Business 201 GA TATION emory ister 1 Data 1 2 rite hift Extend left 2 21 oll the barrels down the road Big fire hazard - probably will not meet OHA standards Occupational afety and Health Administration 22 Business 201 Trucking vs. Pipelines GA TATION GA TATION Build a pipeline ill meet OHA standards ight make the environmentalists angry Now let s do the math Pipeline can accept 1 barrel every hour Q: How many barrels get delivered to the gas station per day? Q: How many barrels are in-flight at any moment? Trucks Each truck can carry 5 barrels Can load a truck with 5 barrels in 1 hour Truck takes 1 day to drive to and from gas station LOT of TE when loading area, gas station, and pieces of the road are unused nless you have lots of trucks Pipelines Pipeline can accept 1 barrel every hour esources (loading area, gas station, pipeline) are always in use As long as you can keep your pipeline full (e.g., you have enough barrels) 23 2
5 Big Idea: Pipeline Concurrency Big Idea: It s Faster This computation is too long I can launch a new computation every 0ns in this structure 0 ns 0 ns Pipelined version, 5 pipe stages Pipelined version, 5 pipe stages: I can launch a new computation every 20ns in pipelined structure ~20 ns Latches, called Pipeline registers break up computation into stages ~20 ns Latches, called Pipeline registers break up computation into stages : Implementation Issues hat prevents us from just doing a zillion pipe stages? ome computations just won t divide into any finer (shorter in time) logical implementations ltimately, often comes down to circuit design issues ~20 ns ~2 ns 5 stages: OK 50 stages: ne, sorry 27 : Implementation Issues hat prevents us from just doing a zillion pipe stages? Those latches are NOT free, they take up area, and there is a real delay to go TH the latch itself ~2ns ~0.2ns In modern, deep pipeline (-20 stages), this is a real effect stage pipe Typically see logic depths in one pipe stage of -20 gates ~20 At these speeds, and with this few levels of logic, latch delay is important 28 emember the A big.little Idea? LITTLE How any Pipeline tages? E.g., Intel Pentium : over 20 stages ore than 120 instructions in flight High clock frequency (>3GHz) High I (s per Cycle) BIG Pipeline depth: 8- uch lower power Too many stages: Lots of complications hould take care of possible dependencies among in-flight instructions Control logic is huge Too little work per stage, too high a branch miss-prediction penalty bad performance Pipeline depth: 15-2 uch higher frequency 29 30
6 Performance of Pipelined ystems IP Pipeline tages npipelined instructions Throughput: 1 per 5 cycles Pipelined Pipeline stage time time Latency 5 cycles tage 1: Fetch IF tage 2: Decode ID tage 3: Execute E tage : emory Access E tage 5: rite Back (to register file) B Throughput: 1 per 1 cycle Ideal speedup only if we can keep the pipeline full! Latency 5 cycles Ideally, peedup pipeline = sequential Pipeline Depth stage Version of IP Datapath Complete 5 tage Pipeline (Drawn maller) TAGE 1 Instr. Fetch TAGE2 Decode TAGE 3 Execute TAGE emacc TAGE 5 riteback Current Current IF/ID ID/E E/E E/B E G I T emory (A) E ister File E 1 G 1 I rite T E E G I T E rite Data Data emory E (A) G I T E emory (A) ister File 1 1 rite ADDE Data emory (A) 33 3 Flow of s Through Pipeline L 1, 0(0) Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 tage 1 - IF ( Fetch) Fetch L L 2,200(0) D Current IF/ID ID/E E/E E/B L 3, 300(0) In cycle we have 3 instructions in-flight : Inst 1 is accessing the memory (D) Inst 2 is using the (E) Inst 3 is access the register file (ID) D 35 emory (A) ister File 1 1 rite ADDE Data emory (A) 36
7 tage 2 - ID ( Decode) tage 3 - E () Decode L L Current IF/ID ID/E E/E E/B Current IF/ID ID/E E/E E/B emory (A) ister File 1 1 rite ADDE Data emory (A) 37 emory (A) ister File 1 1 rite ADDE Data emory (A) 38 tage - E (emory) Current emory (A) IF/ID ID/E E/E E/B ister File 1 1 rite ADDE emory L Data emory (A) 39 tage 5 - B (rite Back) Current emory (A) ister File 1 1 rite exte 16 nd 32 ADDE Data emory (A) riteback L IF/ID ID/E E/E E/B 0 New Complications The good news ultiple instructions are running at the same time, thru the path This works because each stage of pipeline is isolated by latches o, in the best of all possible worlds, N stage pipe has N instructions flowing thru it, speedup is close to N. The bad news s interfere with each other Common name for these: conflicts hy? Different instructions in flight thru path at same time Different instructions might want to use the same piece of hardware in the path at the same time (i.e., in same clock cycle) These conflicts contention for an over-used resource are the source of endless grief in pipeline design 1 Good News: >1 In Flight in Pipe ADD 2,3,1 B 5,6,7 ADD,11,12 Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 D D 2
8 Bad News: s Interfere ADD, 11, 12 Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 rite to the register file Interference in a Pipe In its most basic form, it s about contention for a resource 2 instructions want to use a piece of hardware in the pipe There s only one of these in the pipe, maybe it can t service the requirements of more than one instruction at a time ADD 17, 0, 0 D get put put The conflict from previous slide s instruction sequence ADD 16, 0, 0 D B 20, 21, 22 D get put put ADD 30, 17, 18 from the register file 3 D ometimes, You Can edesign the esource In this particular case The problem is one instruction EAD register file and the other ITE register file olution: allow ITE-then-EAD in one clock cycle ( double pump ) get put put get No conflict now, 1st instruction writes in 1st half of clock cycle, later instruction reads in 2nd half put put Now, Even this Case orks OK ADD, 11, 12 ADD 17, 0, 0 ADD 16, 0, 0 B 20, 21, 22 Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 D 17 D D 5 ADD 30, 17, D But..This Case till crews p ADD 2,3,1 Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 Another Conflict: Data Hazards Basic structure An instruction in flight wants to use a value that s not done yet Done means it s been computed and it s located where I would normally expect to go look in the pipe hardware to find it B 5,6,7 ADD,11,12 ADD 12,,11 D D riteback esult into D value out of 7 Basic cause You are used to assuming a purely sequential model of instruction execution N finishes before instruction N+k, for k >= 1 Ne, sorry -- not true any more in a pipeline There are dependencies now between nearby instructions ( near in sequential order of fetch from memory) Consequence Data hazards -- instructions want values that are not done yet, or in the right place yet 8
9 This Data Hazard, evisited In this particular case value is not computed or returned to register file when later instruction wants to use it as an input get get put put put put Double pumping reg file doesn t help here; later instruction needs 2 clock cycles before it s been computed & stored back. Os Cing with Data Hazards hat do you do? ometimes the dumb-sounding answer is right Hypothesis: It is BAD when certain instructions overlap in time in certain patterns in our 5 stage IP pipeline Prosed solution Don t let them overlap like this? ight - that is one solution echanics Don t let the instruction flow thru the pipe In particular, don t let it ITE any bits anywhere in the pipe hardware that represents EAL CP state (e.g., register file, memory) Name for this eration: PIPELINE TALL 9 50 Cing with Data Hazards: Example Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 olution 1 : tall Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 ADD, 11, 12 D ADD, 11, 12 D ADD 12,, 11 D ADD 12,, 11 bubble bubble D ADD 11,, 12 D ADD 11,, 12 Empty slots in in the pipe called bubbles; means no real instruction work getting saved here echanically: How Do e tall? ecall the isters Between Pipeline tages Add extra hardware to detect stall situations atches the instruction field bits Looks for read versus write conflicts in particular pipe stages Basically, a bunch of careful case logic Current IF/ID ID/E E/E E/B Add extra hardware to push bubbles thru pipe Actually, relatively easy Can just let the instruction you want to stall GO FOAD thru the pipe but, TN OFF the bits that allow any results to get written into the machine state o, the instruction executes (it does the work), but doesn t save If an instruction executes in the middle of forest, but no registers are around to save the results did it really execute? (No.) emory (A) ister File 1 1 rite ADDE Data emory (A) 53 5
10 ecall hat an Looks Like add 8, 17, 18 is stored in binary format as IP lays out instructions into fields rs rt rd shamt funct eration of the instruction s first register source erand rt rd shamt shift amount second register source erand register destination erand funct function (select type of eration) e gotta watch these reg fields 55 Data Hazard Logic Current emory (A) Data Hazard Logic s =? d t =? d between ID/E, E/E, and E/B tages IF/ID ID/E E/E E/B s t d d d ister File 1 2 Data 1 rite ADDE Data emory (A) 56 Example Example sub 2, 1, 3 d = 2 s = 1 t = 3 and 12, 2, 5 d = 12 s = 2 t = 5 or 13, 6, 2 d = 13 s = 6 t = 2 add 1, 2, 2 d = 1 s = 2 t = 2 sw 15, 0(2) d = 15 s = 2 t =?? sub 2, 1, 3 d = 2 s = 1 t = 3 and 12, 2, 5 d = 12 s = 2 t = 5 or 13, 6, 2 d = 13 s = 6 t = 2 add 1, 2, 2 d = 1 s = 2 t = 2 sw 15, 0(2) d = 15 s = 2 t = B-AND Hazard E/E.isterd == ID/E. isters == 2 B-O Hazard E/B.isterd == ID/E. istert == 2 Interactions (real or not) can be tricky Example: do instruction #1 (sub) and # (add) interact, conflict? ell, they do BOTH want to use No Dependence Between #1 and # In this case, double pumped reg file makes it ok Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 How Else Could e tall the Pipeline? Compiler can insert ns Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 B 2, 1, 3 D 2 ADD, 11, 12 D AND 12, 2, 5 O 13, 6, 2 D D n n On IP 0 = 0+0 will do it-- saves no state D D ADD 1, 2, 2 2 D ADD 12,, 11 D 59 60
11 Or, The Hardware Can imulate NOP Next lecture Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 How to fix the pipeline to avoid (most) dependency problems ADD, 11, 12 D stall bubble bubble bubble bubble stall bubble bubble bubble bubble ADD 12,, 11 D 61 62
Single-Cycle Examples, Multi-Cycle Introduction
Single-Cycle Examples, ulti-cycle Introduction 1 Today s enu Single cycle examples Single cycle machines vs. multi-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of
More informationLecture 10 Multi-Cycle Implementation
Lecture 10 ulti-cycle Implementation 1 Today s enu ulti-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of Datapath and Control icroprogramming 2 ulti-cycle Solution
More informationProcessor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed
Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,
More informationChapter 4 (Part II) Sequential Laundry
Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More information第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月
第三章 Instruction-Level Parallelism and Its Dynamic Exploitation 陈文智 chenwz@zju.edu.cn 浙江大学计算机学院 2014 年 10 月 1 3.3 The Major Hurdle of Pipelining Pipeline Hazards 本科回顾 ------- Appendix A.2 3.3.1 Taxonomy
More informationWhat do we have so far? Multi-Cycle Datapath (Textbook Version)
What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001
More informationCOMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions
COP33 - Computer Architecture Lecture ulti-cycle Design & Exceptions Single Cycle Datapath We designed a processor that requires one cycle per instruction RegDst busw 32 Clk RegWr Rd ux imm6 Rt 5 5 Rs
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationThe Pipelined MIPS Processor
1 The niversity of Texas at Dallas Lecture #20: The Pipeline IPS Processor The Pipelined IPS Processor We complete our study of AL architecture by investigating an approach providing even higher performance
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationImprove performance by increasing instruction throughput
Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationPipelined Datapath. One register file is enough
ipelined path The goal of pipelining is to allow multiple instructions execute at the same time We may need to perform several operations in a cycle Increment the and add s at the same time. Fetch one
More informationBasic Pipelining Concepts
Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationChapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns
Chapter Si Pipelining Improve perfomance by increasing instruction throughput eecutionı Time lw $, ($) 2 6 8 2 6 8 access lw $2, 2($) 8 ns access lw $3, 3($) eecutionı Time lw $, ($) lw $2, 2($) 2 ns 8
More informationCS 152 Computer Architecture and Engineering Lecture 4 Pipelining
CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationLecture 6: Pipelining
Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationPipelining: Basic Concepts
Pipelining: Basic Concepts Prof. Cristina Silvano Dipartimento di Elettronica e Informazione Politecnico di ilano email: silvano@elet.polimi.it Outline Reduced Instruction Set of IPS Processor Implementation
More informationMIPS An ISA for Pipelining
Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationLecture 19 Introduction to Pipelining
CSE 30321 Lecture 19 Pipelining (Part 1) 1 Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) Basic pipelining basic := single, in-order issue single issue one instruction at
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationModule 4c: Pipelining
Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationSI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,
SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Chapter 6 ADMIN ing for Chapter 6: 6., 6.9-6.2 2 Midnight Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 3 Smarty
More informationChapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts
CS359: Computer Architecture Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University Parallel
More informationLecture 5: The Processor
Lecture 5: The Processor CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and
More informationSuggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike
More informationProcessor Architecture
Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationThe overall datapath for RT, lw,sw beq instrucution
Designing The Main Control Unit: Remember the three instruction classes {R-type, Memory, Branch}: a) R-type : Op rs rt rd shamt funct 1.src 2.src dest. 31-26 25-21 20-16 15-11 10-6 5-0 a) Memory : Op rs
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected
More informationDesigning a Pipelined CPU
Designing a Pipelined CPU CSE 4, S2'6 Review -- Single Cycle CPU CSE 4, S2'6 Review -- ultiple Cycle CPU CSE 4, S2'6 Review -- Instruction Latencies Single-Cycle CPU Load Ifetch /Dec Exec em Wr ultiple
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationPipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010
Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several
More informationUnpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory
Pipelining the Idea Similar to assembly line in a factory Divide instruction into smaller tasks Each task is performed on subset of resources Overlap the execution of multiple instructions by completing
More informationMidnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4
IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II 2007-8-1 Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance
More informationComputer Architectures. DLX ISA: Pipelined Implementation
Computer Architectures L ISA: Pipelined Implementation 1 The Pipelining Principle Pipelining is nowadays the main basic technique deployed to speed-up a CP. The key idea for pipelining is general, and
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationLecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1
Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationPipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations
Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationCSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements
CSE 4 Computer Architecture Spring 25 Lectures Exceptions and Introduction to Pipelining May 4, 25 Announcements Reading Assignment Sections 5.6, 5.9 The Processor Datapath and Control Section 6., Enhancing
More informationPipelining. Each step does a small fraction of the job All steps ideally operate concurrently
Pipelining Computational assembly line Each step does a small fraction of the job All steps ideally operate concurrently A form of vertical concurrency Stage/segment - responsible for 1 step 1 machine
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationAdvanced Computer Architecture Pipelining
Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson,
More informationEECS Digital Design
EECS 150 -- Digital Design Lecture 11-- Processor Pipelining 2010-2-23 John Wawrzynek Today s lecture by John Lazzaro www-inst.eecs.berkeley.edu/~cs150 1 Today: Pipelining How to apply the performance
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationMore advanced CPUs. August 4, Howard Huang 1
More advanced CPUs In the last two weeks we presented the design of a basic processor. The datapath performs operations on register and memory data. A control unit translates program instructions into
More informationCISC 662 Graduate Computer Architecture Lecture 5 - Pipeline. Pipelining. Pipelining the Idea. Similar to assembly line in a factory:
CISC 662 Graduate Computer rchitecture Lecture 5 - Pipeline ichela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer rchitecture,
More informationTi Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr
Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions
More informationData paths for MIPS instructions
You are familiar with how MIPS programs step from one instruction to the next, and how branches can occur conditionally or unconditionally. We next examine the machine level representation of how MIPS
More informationReview: Abstract Implementation View
Review: Abstract Implementation View Split memory (Harvard) model - single cycle operation Simplified to contain only the instructions: memory-reference instructions: lw, sw arithmetic-logical instructions:
More informationPipelined CPUs. Study Chapter 4 of Text. Where are the registers?
Pipelined CPUs Where are the registers? Study Chapter 4 of Text Second Quiz on Friday. Covers lectures 8-14. Open book, open note, no computers or calculators. L17 Pipelined CPU I 1 Review of CPU Performance
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Material from Earlier in the Semester Throughput and latency
More informationECE260: Fundamentals of Computer Engineering
Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining
More information