Today s Menu. Multi-Cycle Exceptions Pipelining. Exceptions and Interrupts. Handling Exceptions and Interrupts. Why This Is Very Messy.

Size: px

Start display at page:

Download "Today s Menu. Multi-Cycle Exceptions Pipelining. Exceptions and Interrupts. Handling Exceptions and Interrupts. Why This Is Very Messy."

Todd Dorsey
6 years ago
Views:

ulti-cycle Exceptions Today s enu Exceptions hat are they? hat do we do about them? Introduction to pipelining hy pipelining? hy is it difficult? How can we do it efficiently?

keyboard Disk drive asking for attention Arrival of a network packet Examples of Exceptions Divide by zero Overflow Page fault Handling Exceptions and Interrupts hen do we jump to an exception?

1 ulti-cycle Exceptions Today s enu Exceptions hat are they? hat do we do about them? Introduction to pipelining hy pipelining? hy is it difficult? How can we do it efficiently? Examples 1 2 Exceptions and Interrupts Exceptions are exceptional events that disrupt the normal flow of a program Terminology varies between different machines Examples of Interrupts ser hitting the keyboard Disk drive asking for attention Arrival of a network packet Examples of Exceptions Divide by zero Overflow Page fault Handling Exceptions and Interrupts hen do we jump to an exception? pon detection, invoke the O to service the event ight when it occurs? hat about in the middle of executing a multi-cycle instruction Difficult to abort the middle of an instruction Processor checks for event at the end of every instruction Processor provides E & Cause registers to inform O of cause E - Exception Counter Holds that the O should jump to when resuming execution Cause ister Holds bit-encoded cause of the exception 3 Exception Flow hy This Is Very essy hen an exception (or interrupt) occurs, control is transferred to the O hen the O is done, it jumps back to the user program (if it can) ser Process Event exception Exception return (tional) Operating ystem Exception processing by exception handler 5 You have many instructions in flight In one of these instructions, a bad thing happens, eg, divide-by-zero hat do we have to do? e have to deal with this event, since normal program execution is probably now incorrect But, we have a bunch of instructions in flight any of them, but maybe not all of them, need to get killed Don t want to kill stuff that is actually correct, and waste that work. hen do we kill them? NO -- die die die.? ait till exception-causing instruction finishes? ait till the pipeline empties? Very very very messy part of real machine design. 6

2 eview of ulticycle vs. ingle Cycle Complete ingle-cycle Datapath ingle cycle implementations have to consider the worst case delay through the path to come-up with the cycle time. ulticycle implementations have the advantage of using a different number of cycles for executing each instruction. Current emory (A) ADDE In general, the multicycle machine is better than the single cycle machine, but the actual execution time strongly depends on the workload. The most widely used machine implementation is neither single cycle, nor multicycle it s the pipelined implementation. (Next lecture) ister File 1 2 Data 1 rite Data emory (A) rite Data 7 8 Cost of the ingle Cycle Architecture ulti-cycle olution Instr Class 1 Instr Class 2 Instr Class 3 Our Cycle (longest ) Idea: Let the FATET instruction determine clock period Instr Class 1 Instr Class 2 Instr Class 3 Takes cycles Takes 2 cycles ost of the time is wasted! 9 Less asted ulti-cycle eality ulticycle Control Add Intermediate isters e are going to go further than allowing the fastest instruction to determine rate e are going to break EVEY instruction up into phases -class Load em IorD emrite Irite ister Dest rite 1 2 Data 1 A rite B ela Out Branch tore D Extend emto [5:0] hift left 2 elb Control Op 11 12

3 ulticycle Let s build cars Henry Ford, odel T, 1908 Non-pipelined: 1 car/ hours 13 1 Henry Ford, odel T, 1908 Henry Ford, odel T, 1908 Non-pipelined: 1 car/ hours Non-pipelined: 1 car/ hours Henry Ford, odel T, 1908 Henry Ford, odel T, 1908 Non-pipelined: 1 car/ hours Non-pipelined: 1 car/ hours 17 18

Analogy: Gasoline Transportation Henry Ford, odel T, 1908 Non-pipelined: 1 car/ hours pipelined: 1 car/hour 19 Trucking gas from depot to gas station Get the barrels Load them into the truck Drive to

4 Analogy: Gasoline Transportation Henry Ford, odel T, 1908 Non-pipelined: 1 car/ hours pipelined: 1 car/hour 19 Trucking gas from depot to gas station Get the barrels Load them into the truck Drive to the gas station nload the gas eturn for more oil Let s do the math Each truck can carry 5 barrels Can load a truck with 5 barrels in 1 hour It takes each truck 1 day to drive to and from gas station Q: How many barrels per week are delivered? Q: hat if I had more trucks? GA TATION 20 Looks a Lot Like a ulticycle Processor hat are the steps Fetch an instruction (Get the barrels) Decode the instruction (Load them into the truck) OP (Drive to the gas station) emory Access (nload the gas) rite-back (eturn for more oil) Business 201 GA TATION emory ister 1 Data 1 2 rite hift Extend left 2 21 oll the barrels down the road Big fire hazard - probably will not meet OHA standards Occupational afety and Health Administration 22 Business 201 Trucking vs. Pipelines GA TATION GA TATION Build a pipeline ill meet OHA standards ight make the environmentalists angry Now let s do the math Pipeline can accept 1 barrel every hour Q: How many barrels get delivered to the gas station per day? Q: How many barrels are in-flight at any moment? Trucks Each truck can carry 5 barrels Can load a truck with 5 barrels in 1 hour Truck takes 1 day to drive to and from gas station LOT of TE when loading area, gas station, and pieces of the road are unused nless you have lots of trucks Pipelines Pipeline can accept 1 barrel every hour esources (loading area, gas station, pipeline) are always in use As long as you can keep your pipeline full (e.g., you have enough barrels) 23 2

Big Idea: Pipeline Concurrency Big Idea: It s Faster This computation is too long I can launch a new computation every 0ns in this structure 0 ns 0 ns Pipelined version, 5 pipe stages Pipelined

registers break up computation into stages 25 26 : Implementation Issues hat prevents us from just doing a zillion pipe stages?

Implementation Issues hat prevents us from just doing a zillion pipe stages? Those latches are NOT free, they take up area, and there is a real delay to go TH the latch itself ~2ns ~0.

5 Big Idea: Pipeline Concurrency Big Idea: It s Faster This computation is too long I can launch a new computation every 0ns in this structure 0 ns 0 ns Pipelined version, 5 pipe stages Pipelined version, 5 pipe stages: I can launch a new computation every 20ns in pipelined structure ~20 ns Latches, called Pipeline registers break up computation into stages ~20 ns Latches, called Pipeline registers break up computation into stages : Implementation Issues hat prevents us from just doing a zillion pipe stages? ome computations just won t divide into any finer (shorter in time) logical implementations ltimately, often comes down to circuit design issues ~20 ns ~2 ns 5 stages: OK 50 stages: ne, sorry 27 : Implementation Issues hat prevents us from just doing a zillion pipe stages? Those latches are NOT free, they take up area, and there is a real delay to go TH the latch itself ~2ns ~0.2ns In modern, deep pipeline (-20 stages), this is a real effect stage pipe Typically see logic depths in one pipe stage of -20 gates ~20 At these speeds, and with this few levels of logic, latch delay is important 28 emember the A big.little Idea? LITTLE How any Pipeline tages? E.g., Intel Pentium : over 20 stages ore than 120 instructions in flight High clock frequency (>3GHz) High I (s per Cycle) BIG Pipeline depth: 8- uch lower power Too many stages: Lots of complications hould take care of possible dependencies among in-flight instructions Control logic is huge Too little work per stage, too high a branch miss-prediction penalty bad performance Pipeline depth: 15-2 uch higher frequency 29 30

6 Performance of Pipelined ystems IP Pipeline tages npipelined instructions Throughput: 1 per 5 cycles Pipelined Pipeline stage time time Latency 5 cycles tage 1: Fetch IF tage 2: Decode ID tage 3: Execute E tage : emory Access E tage 5: rite Back (to register file) B Throughput: 1 per 1 cycle Ideal speedup only if we can keep the pipeline full! Latency 5 cycles Ideally, peedup pipeline = sequential Pipeline Depth stage Version of IP Datapath Complete 5 tage Pipeline (Drawn maller) TAGE 1 Instr. Fetch TAGE2 Decode TAGE 3 Execute TAGE emacc TAGE 5 riteback Current Current IF/ID ID/E E/E E/B E G I T emory (A) E ister File E 1 G 1 I rite T E E G I T E rite Data Data emory E (A) G I T E emory (A) ister File 1 1 rite ADDE Data emory (A) 33 3 Flow of s Through Pipeline L 1, 0(0) Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 tage 1 - IF ( Fetch) Fetch L L 2,200(0) D Current IF/ID ID/E E/E E/B L 3, 300(0) In cycle we have 3 instructions in-flight : Inst 1 is accessing the memory (D) Inst 2 is using the (E) Inst 3 is access the register file (ID) D 35 emory (A) ister File 1 1 rite ADDE Data emory (A) 36

7 tage 2 - ID ( Decode) tage 3 - E () Decode L L Current IF/ID ID/E E/E E/B Current IF/ID ID/E E/E E/B emory (A) ister File 1 1 rite ADDE Data emory (A) 37 emory (A) ister File 1 1 rite ADDE Data emory (A) 38 tage - E (emory) Current emory (A) IF/ID ID/E E/E E/B ister File 1 1 rite ADDE emory L Data emory (A) 39 tage 5 - B (rite Back) Current emory (A) ister File 1 1 rite exte 16 nd 32 ADDE Data emory (A) riteback L IF/ID ID/E E/E E/B 0 New Complications The good news ultiple instructions are running at the same time, thru the path This works because each stage of pipeline is isolated by latches o, in the best of all possible worlds, N stage pipe has N instructions flowing thru it, speedup is close to N. The bad news s interfere with each other Common name for these: conflicts hy? Different instructions in flight thru path at same time Different instructions might want to use the same piece of hardware in the path at the same time (i.e., in same clock cycle) These conflicts contention for an over-used resource are the source of endless grief in pipeline design 1 Good News: >1 In Flight in Pipe ADD 2,3,1 B 5,6,7 ADD,11,12 Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 D D 2

8 Bad News: s Interfere ADD, 11, 12 Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 rite to the register file Interference in a Pipe In its most basic form, it s about contention for a resource 2 instructions want to use a piece of hardware in the pipe There s only one of these in the pipe, maybe it can t service the requirements of more than one instruction at a time ADD 17, 0, 0 D get put put The conflict from previous slide s instruction sequence ADD 16, 0, 0 D B 20, 21, 22 D get put put ADD 30, 17, 18 from the register file 3 D ometimes, You Can edesign the esource In this particular case The problem is one instruction EAD register file and the other ITE register file olution: allow ITE-then-EAD in one clock cycle ( double pump ) get put put get No conflict now, 1st instruction writes in 1st half of clock cycle, later instruction reads in 2nd half put put Now, Even this Case orks OK ADD, 11, 12 ADD 17, 0, 0 ADD 16, 0, 0 B 20, 21, 22 Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 D 17 D D 5 ADD 30, 17, D But..This Case till crews p ADD 2,3,1 Cycle 1 Cycle 2 Cycle 3 Cycle D Cycle 5 Cycle 6 Another Conflict: Data Hazards Basic structure An instruction in flight wants to use a value that s not done yet Done means it s been computed and it s located where I would normally expect to go look in the pipe hardware to find it B 5,6,7 ADD,11,12 ADD 12,,11 D D riteback esult into D value out of 7 Basic cause You are used to assuming a purely sequential model of instruction execution N finishes before instruction N+k, for k >= 1 Ne, sorry -- not true any more in a pipeline There are dependencies now between nearby instructions ( near in sequential order of fetch from memory) Consequence Data hazards -- instructions want values that are not done yet, or in the right place yet 8

9 This Data Hazard, evisited In this particular case value is not computed or returned to register file when later instruction wants to use it as an input get get put put put put Double pumping reg file doesn t help here; later instruction needs 2 clock cycles before it s been computed & stored back. Os Cing with Data Hazards hat do you do? ometimes the dumb-sounding answer is right Hypothesis: It is BAD when certain instructions overlap in time in certain patterns in our 5 stage IP pipeline Prosed solution Don t let them overlap like this? ight - that is one solution echanics Don t let the instruction flow thru the pipe In particular, don t let it ITE any bits anywhere in the pipe hardware that represents EAL CP state (e.g., register file, memory) Name for this eration: PIPELINE TALL 9 50 Cing with Data Hazards: Example Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 olution 1 : tall Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 ADD, 11, 12 D ADD, 11, 12 D ADD 12,, 11 D ADD 12,, 11 bubble bubble D ADD 11,, 12 D ADD 11,, 12 Empty slots in in the pipe called bubbles; means no real instruction work getting saved here echanically: How Do e tall? ecall the isters Between Pipeline tages Add extra hardware to detect stall situations atches the instruction field bits Looks for read versus write conflicts in particular pipe stages Basically, a bunch of careful case logic Current IF/ID ID/E E/E E/B Add extra hardware to push bubbles thru pipe Actually, relatively easy Can just let the instruction you want to stall GO FOAD thru the pipe but, TN OFF the bits that allow any results to get written into the machine state o, the instruction executes (it does the work), but doesn t save If an instruction executes in the middle of forest, but no registers are around to save the results did it really execute? (No.) emory (A) ister File 1 1 rite ADDE Data emory (A) 53 5

10 ecall hat an Looks Like add 8, 17, 18 is stored in binary format as IP lays out instructions into fields rs rt rd shamt funct eration of the instruction s first register source erand rt rd shamt shift amount second register source erand register destination erand funct function (select type of eration) e gotta watch these reg fields 55 Data Hazard Logic Current emory (A) Data Hazard Logic s =? d t =? d between ID/E, E/E, and E/B tages IF/ID ID/E E/E E/B s t d d d ister File 1 2 Data 1 rite ADDE Data emory (A) 56 Example Example sub 2, 1, 3 d = 2 s = 1 t = 3 and 12, 2, 5 d = 12 s = 2 t = 5 or 13, 6, 2 d = 13 s = 6 t = 2 add 1, 2, 2 d = 1 s = 2 t = 2 sw 15, 0(2) d = 15 s = 2 t =?? sub 2, 1, 3 d = 2 s = 1 t = 3 and 12, 2, 5 d = 12 s = 2 t = 5 or 13, 6, 2 d = 13 s = 6 t = 2 add 1, 2, 2 d = 1 s = 2 t = 2 sw 15, 0(2) d = 15 s = 2 t = B-AND Hazard E/E.isterd == ID/E. isters == 2 B-O Hazard E/B.isterd == ID/E. istert == 2 Interactions (real or not) can be tricky Example: do instruction #1 (sub) and # (add) interact, conflict? ell, they do BOTH want to use No Dependence Between #1 and # In this case, double pumped reg file makes it ok Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 How Else Could e tall the Pipeline? Compiler can insert ns Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 B 2, 1, 3 D 2 ADD, 11, 12 D AND 12, 2, 5 O 13, 6, 2 D D n n On IP 0 = 0+0 will do it-- saves no state D D ADD 1, 2, 2 2 D ADD 12,, 11 D 59 60

11 Or, The Hardware Can imulate NOP Next lecture Cycle 1 Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 How to fix the pipeline to avoid (most) dependency problems ADD, 11, 12 D stall bubble bubble bubble bubble stall bubble bubble bubble bubble ADD 12,, 11 D 61 62

Single-Cycle Examples, Multi-Cycle Introduction

Single-Cycle Examples, Multi-Cycle Introduction Single-Cycle Examples, ulti-cycle Introduction 1 Today s enu Single cycle examples Single cycle machines vs. multi-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of