( ) תשס"ח סמסטר ב' May, 2008 Hugo Guterman Web site:

Size: px

Start display at page:

Download "( ) תשס"ח סמסטר ב' May, 2008 Hugo Guterman Web site:"

Clarence Warner
5 years ago
Views:

1 ארכיטקטורת יחידת עיבוד מרכזית ( ) תשס"ח סמסטר ב' May, 2008 Hugo Guterman Web site: Arch. CPU L5 Pipeline II 1

2 Outline More pipelining Control Hazards Registers and Memory Branch Prediction Exceptions and interrupts Examples Arch. CPU L5 Pipeline II 2

3 Taxonomy of Hazards Arch. CPU L5 Pipeline II 3

4 Control Hazards - Branches Arch. CPU L5 Pipeline II 4

5 Basic Pipeline Arch. CPU L5 Pipeline II 5

6 Branch Hazards Arch. CPU L5 Pipeline II 6

7 Branch Hazards Arch. CPU L5 Pipeline II 7

8 Solution assume branch not taken!! Arch. CPU L5 Pipeline II 8

9 What to do if branch taken? Arch. CPU L5 Pipeline II 9

10 What happens when branch is taken? Arch. CPU L5 Pipeline II 10

11 Side effects Arch. CPU L5 Pipeline II 11

12 Side effects Arch. CPU L5 Pipeline II 12

13 Move the branch computation forward Arch. CPU L5 Pipeline II 13

14 Move the branch computation further forward Arch. CPU L5 Pipeline II 14

15 Result: new improved MIPS Datapath Arch. CPU L5 Pipeline II 15

16 Pipeline Idiosyncrasies Arch. CPU L5 Pipeline II 16

17 Rewrite the code for delay slot Arch. CPU L5 Pipeline II 17

18 Problems with delay slot Arch. CPU L5 Pipeline II 18

19 Datapath with branch logic Arch. CPU L5 Pipeline II 19

20 Problems with delay slot Arch. CPU L5 Pipeline II 20

21 Branch prediction is better? Arch. CPU L5 Pipeline II 21

22 Prediction of non-taken Arch. CPU L5 Pipeline II 22

23 Branch miss-prediction Arch. CPU L5 Pipeline II 23

24 How to improve?? Arch. CPU L5 Pipeline II 24

25 Arch. CPU L5 Pipeline II 25

26 Dynamic Branch Prediction Arch. CPU L5 Pipeline II 26

27 1-bit Branch Prediction Arch. CPU L5 Pipeline II 27

28 1-bit Branch Prediction Arch. CPU L5 Pipeline II 28

29 Dynamic Branch Prediction Solution: 2-bit scheme where change prediction only if get misprediction twice Predict Taken Predict Not Taken T T NT NT T T Predict Taken NT Predict Not Taken NT Arch. CPU L5 Pipeline II 29

30 Need Same Time as Prediction Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) Note: must check for branch match now, since can t use wrong branch address Predicted PC Branch Prediction: Taken or not Taken Return instruction addresses predicted with stack Arch. CPU L5 Pipeline II 30

31 What makes pipelines hard to implement Arch. CPU L5 Pipeline II 31

32 Exception & Interrupts Arch. CPU L5 Pipeline II 32

33 Exception Flow Arch. CPU L5 Pipeline II 33

34 Flow of instructions during exception Arch. CPU L5 Pipeline II 34

35 Characterization of exceptions and interrupts Arch. CPU L5 Pipeline II 35

36 Type of exceptions Arch. CPU L5 Pipeline II 36

37 Stooping and Restarting Execution Arch. CPU L5 Pipeline II 37

38 Precise vs. Imprecise Exceptions Arch. CPU L5 Pipeline II 38

39 Precise vs. Imprecise Exceptions Arch. CPU L5 Pipeline II 39

40 Exceptions and CPU Architecture Arch. CPU L5 Pipeline II 40

41 Multiple Exceptions Arch. CPU L5 Pipeline II 41

42 Multiple Exceptions Arch. CPU L5 Pipeline II 42

43 Exceptions Arch. CPU L5 Pipeline II 43

44 Performance of Pipelined Systems Arch. CPU L5 Pipeline II 44

45 Data dependencies Arch. CPU L5 Pipeline II 45

46 Data dependencies Arch. CPU L5 Pipeline II 46

47 Branch delay slot Arch. CPU L5 Pipeline II 47

48 Branch delay slot Arch. CPU L5 Pipeline II 48

49 Bypass Paths Arch. CPU L5 Pipeline II 49

50 Bypass Paths Arch. CPU L5 Pipeline II 50

51 Loop unrolling Arch. CPU L5 Pipeline II 51

52 Loop unrolling Arch. CPU L5 Pipeline II 52

53 Loop unrolling Arch. CPU L5 Pipeline II 53

54 Code Performance Arch. CPU L5 Pipeline II 54

55 Code Performance Arch. CPU L5 Pipeline II 55

56 Machine Performance Arch. CPU L5 Pipeline II 56

57 Machine Performance Arch. CPU L5 Pipeline II 57

58 Machine Performance (2) Arch. CPU L5 Pipeline II 58

59 Machine Performance (2) Arch. CPU L5 Pipeline II 59

60 Pipeline Hazards Again I-Fet ch ID MemOpFetch OpFetch Exec Store Structural Hazard IFetch ID I-Fet ch ID OpFetch Jump Control Hazard IFetch I D IF ID EX Mem WB IF ID EX Mem IF ID EX Mem WB RAW (read after write) Data Hazard WB WAW Data Hazard (write after write) IF ID OF Ex Mem IF ID OF Ex RS WAR Data Hazard (write after read) Arch. CPU L5 Pipeline II 60

61 Data Hazards Avoid some by design eliminate WAR by always fetching operands early (DCD) in pipe eleminate WAW by doing all WBs in order (last stage, static) Detect and resolve remaining ones stall or forward (if possible) IF ID EX Mem WB RAW Data Hazard IF ID EX Mem IF ID EX Mem WB WB WAW Data Hazard IF ID OF Ex Mem IF ID OF Ex RS RAW Data Hazard Arch. CPU L5 Pipeline II 61

62 Hazard Detection Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. A RAW hazard exists on register ρ if ρ Rregs( i ) Wregs( j ) Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. When instruction issues, reserve its result register. When on operation completes, remove its write reservation. A WAW hazard exists on register ρ if ρ Wregs( i ) Wregs( j ) A WAR hazard exists on register ρ if ρ Wregs( i ) Rregs( j ) Arch. CPU L5 Pipeline II 62

63 Issues in Pipelined design Pipelining Super-pipeline - Issue one instruction per (fast) cycle - ALU takes multiple cycles IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W Limitation Issue rate, FU stalls, FU depth Clock skew, FU stalls, FU depth Super-scalar - Issue multiple scalar instructions per cycle IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W Hazard resolution VLIW ( EPIC ) - Each instruction specifies multiple scalar operations - Compiler determines parallelism IF D Ex M W Ex M W Ex M W Ex M W Packing Vector operations - Each instruction specifies series of identical operations IF D Ex M W Ex M W Ex M W Ex M W Applicability Arch. CPU L5 Pipeline II 63

64 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Input Control Memory Datapath Output Arch. CPU L5 Pipeline II 64

65 FYI: Clocking discipline phi1 phi2 2-phase non-overlapping clocks Pipeline stage is two (level sensitive) latches Edge-triggered phi1 phi2 phi1 Arch. CPU L5 Pipeline II 65

ארכי טק טורת יחיד ת עיבוד מרכזי ת

ארכי טק טורת יחיד ת עיבוד מרכזי ת (36113741) תשס"ג סמסטר א' March, 2007 Hugo Guterman (hugo@ee.bgu.ac.il) Web site: http://www.ee.bgu.ac.il/~cpuarch Arch. CPU L5 Pipeline II 1 Outline More pipelining Control