administrivia final hour exam next Wednesday covers assembly language like hw and worksheets

Size: px
Start display at page:

Download "administrivia final hour exam next Wednesday covers assembly language like hw and worksheets"

Transcription

1 administrivia final hour exam next Wednesday covers assembly language like hw and worksheets today last worksheet start looking at more details on hardware not covered on ANY exam probably won t finish these slides today any questions on assignment? 1

2 more architecture remember how cpu executes instructions? multiple simple steps... 2

3 CPU (logical) Decode ALU Mem Buffer eax ebx ecx edx... EBP SP PC Memory 3

4 CPU (logical) Decode ALU Mem Buffer eax ebx ecx edx... EBP SP PC add %eax,%ebx Memory 3

5 CPU (logical) PHASES eax 3 ebx 4 ecx 8 edx 5 ALU... EBP SP PC Decode Mem Buffer add %eax,%ebx Memory 3

6 CPU (logical) PHASES eax 3 ebx 4 FETCH ecx 8 edx 5 ALU... EBP SP PC Decode Mem Buffer add %eax,%ebx Memory 3

7 CPU (logical) PHASES eax 3 ebx ebx ebx 4 FETCH eax ecx 8 DECODE + edx 5 ALU... EBP SP PC Decode Mem Buffer add %eax,%ebx Memory 3

8 CPU (logical) 4 3 PHASES eax 3 ebx ebx ebx 4 FETCH eax ecx 8 + edx 5 DECODE ALU... OPFETCH EBP SP PC Decode Mem Buffer add %eax,%ebx Memory 3

9 CPU (logical) 4 3 PHASES eax 3 ebx ebx ebx 4 FETCH eax ecx 8 + edx 5 DECODE ALU... OPFETCH EBP EXECUTE SP PC Decode 7 Mem Buffer add %eax,%ebx Memory 3

10 CPU (logical) 4 3 PHASES eax 3 ebx ebx ebx 47 FETCH eax ecx 8 + edx 5 DECODE ALU... OPFETCH EBP EXECUTE SP PC WRITEBACK Decode 7 Mem Buffer add %eax,%ebx Memory 3

11 computer performance modern processor runs at multiple GHz billions of cycles per second that says the clock cycle < ns less than a billionth of second even silicon cannot do much in that time only executes one step per cycle multiple cycles to execute one instruction 4

12 overall performance on the other hand processor does MORE than one add per cycle 5

13 overall performance on the other hand processor does MORE than one add per cycle doesn t that contradict previous slide? 5

14 overall performance on the other hand processor does MORE than one add per cycle doesn t that contradict previous slide? no because computer designers are clever 5

15 overlapping instructions one set of transistors can only do one thing in one cycle but cpu has LOTS of transistors can do lots of things at once work on multiple instructions at once 6

16 washing consider doing wash with 1 washer/1 dryer if each takes 45 minutes takes 1.5 hours to do 1 load maybe 2 hours if you count pre-treating/sorting and folding/hanging does not take 6 hours to do 3 loads! 7

17 overlap washing steps takes 2 hours for first load to be done each extra load only takes 45 minutes more if you had 1000 loads would think of it as taking 45 minutes per load 8

18 overlap washing steps takes 2 hours for first load to be done each extra load only takes 45 minutes more if you had 1000 loads would think of it as taking 45 minutes per load and would really hate laundry! 8

19 code example consider the code movl %edx,%ecx sarl $4,%eax addl %ebx,%ecx subl %edx,%eax 9

20 code example first instruction must execute movl %edx,%ecx FET DEC OPF EXEC WB 10

21 code example second instruction can start soon after movl %edx,%ecx FET sarl $4,%eax never competition for same transistors 11

22 code example second instruction can start soon after movl %edx,%ecx FET DEC sarl $4,%eax FET never competition for same transistors 11

23 code example second instruction can start soon after movl %edx,%ecx FET DEC OPF sarl $4,%eax FET DEC never competition for same transistors 11

24 code example second instruction can start soon after movl %edx,%ecx FET DEC OPF EXEC sarl $4,%eax FET DEC OPF never competition for same transistors 11

25 code example second instruction can start soon after movl %edx,%ecx FET DEC OPF EXEC WB sarl $4,%eax FET DEC OPF EXEC never competition for same transistors 11

26 code example second instruction can start soon after movl %edx,%ecx FET DEC OPF EXEC WB sarl $4,%eax FET DEC OPF EXEC never competition for same transistors WB 11

27 example third instruction follows suit movl %edx,%ecx FET sarl $4,%eax addl %ebx,%ecx 12

28 example third instruction follows suit movl %edx,%ecx FET DEC sarl $4,%eax FET addl %ebx,%ecx 12

29 example third instruction follows suit movl %edx,%ecx FET DEC OPF sarl $4,%eax FET DEC addl %ebx,%ecx FET 12

30 example third instruction follows suit movl %edx,%ecx FET DEC OPF EXEC sarl $4,%eax FET DEC addl %ebx,%ecx FET OPF DEC 12

31 example third instruction follows suit movl %edx,%ecx FET DEC OPF EXEC WB sarl $4,%eax FET DEC addl %ebx,%ecx FET OPF DEC EXEC OPF 12

32 example third instruction follows suit movl %edx,%ecx FET DEC OPF EXEC WB sarl $4,%eax FET DEC OPF EXEC WB addl %ebx,%ecx FET DEC OPF EXEC 12

33 example third instruction follows suit movl %edx,%ecx FET DEC OPF EXEC WB sarl $4,%eax FET DEC OPF EXEC WB addl %ebx,%ecx FET DEC OPF EXEC WB 12

34 example movl FET Decode movl ALU Mem Buffer eax ebx ecx edx... ebp esp PC Memory 13

35 example movl FET Decode movl ALU Mem Buffer eax ebx ecx edx... ebp esp PC a b c d Memory 13

36 example movl DEC sarl FET ecx edx mov movl Decode ALU Mem Buffer eax ebx ecx edx... ebp esp PC sarl Memory 14

37 example movl DEC sarl FET ecx edx mov movl Decode ALU Mem Buffer eax ebx ecx edx... ebp esp PC a b c d sarl Memory 14

38 example movl OPF sarl DEC adll FET eax 4 eax sar sarl Decode d movl ALU Mem Buffer eax ebx ecx edx... ebp esp PC addl Memory 15

39 example movl OPF sarl DEC adll FET eax 4 eax sar sarl Decode d movl ALU Mem Buffer eax ebx ecx edx... ebp esp PC a b c d addl Memory 15

40 example movl EXEC sarl OPF addl DEC subl OPF ecx ebx ecx add Decode addl subl sarl 4 a ALU movl Mem Buffer eax ebx ecx edx... ebp esp PC Memory 16

41 example movl EXEC sarl OPF addl DEC subl OPF ecx ebx ecx add Decode addl subl sarl 4 a ALU movl Mem Buffer eax ebx ecx edx... ebp esp PC a b c d Memory 16

42 pipelining this overlapping of instructions called pipelining done in all CPUs for last 15 years or so big part of speed up clock speed limited by SLOWEST phase 17

43 hazard anyone see a problem here? movl %edx,%ecx FET DEC OPF EXEC WB sarl $4,%eax FET DEC OPF EXEC WB addl %ebx,%ecx FET DEC OPF EXEC WB 18

44 hazard anyone see a problem here? movl %edx,%ecx FET DEC OPF EXEC WB writes %ecx sarl $4,%eax FET DEC OPF EXEC WB addl %ebx,%ecx FET DEC OPF EXEC WB 18

45 hazard anyone see a problem here? movl %edx,%ecx FET DEC OPF EXEC WB writes %ecx sarl $4,%eax FET DEC addl %ebx,%ecx OPF EXEC WB reads %ecx FET DEC OPF EXEC WB 18

46 example movl WB sarl EXEC addl OPF subl DEC eax edx eax sub subl Decode b addl ALU sarl movl Mem Buffer eax ebx ecx edx... ebp esp PC a b c d Memory 19

47 example movl WB sarl EXEC addl OPF subl DEC eax edx eax sub subl Decode b addl c ALU sarl movl Mem Buffer eax ebx ecx edx... ebp esp PC a b c d Memory 19

48 example movl WB sarl EXEC addl OPF subl DEC eax edx eax sub subl Decode b addl ALU sarl c+d c movl Mem Buffer eax ebx ecx edx... ebp esp PC a b c+d d Memory 19

49 forwarding special hardware in opfetch reads result when needed guarantees correct result 20

50 forwarding movl WB sarl EXEC addl OPF subl DEC eax edx eax sub subl Decode b addl ALU sarl c+d movl Mem Buffer eax ebx ecx edx... ebp esp PC a b c d Memory 21

51 forwarding movl WB sarl EXEC addl OPF subl DEC eax edx eax sub subl Decode b addl ALU sarl c+d c+d movl Mem Buffer eax ebx ecx edx... ebp esp PC a b c d Memory 21

52 forwarding movl WB sarl EXEC addl OPF subl DEC eax edx eax sub subl Decode b addl ALU sarl c+d c+d movl Mem Buffer eax ebx ecx edx... ebp esp PC a b c+d d Memory 21

53 stalls can still stall if execute not finished must wait for value to be computed compiler schedules instructions to avoid these stalls 22

54 stalls code example had no stalls movl %edx,%ecx sarl $4,%eax addl %ebx,%ecx subl %edx,%eax what if reordered? (to more natural ordering) movl %edx,%ecx addl %ebx,%ecx sarl $4,%eax subl %edx,%eax 23

55 stalls code example had no stalls movl %edx,%ecx sarl $4,%eax addl %ebx,%ecx subl %edx,%eax what if reordered? (to more natural ordering) movl %edx,%ecx addl %ebx,%ecx stall on ecx sarl $4,%eax subl %edx,%eax 23

56 stalls code example had no stalls movl %edx,%ecx sarl $4,%eax addl %ebx,%ecx subl %edx,%eax what if reordered? (to more natural ordering) movl %edx,%ecx addl %ebx,%ecx stall on ecx sarl $4,%eax subl %edx,%eax stall on eax 23

57 reducing cycle time can almost always reduce it further break slowest phase into two pieces each takes roughly half the time of the original double clock speed 24

58 RISC vs CISC x86 is classic CISC complex instruction set computer things like cmpl $4096,8(%edx,%eax,4) PowerPC is mainstream RISC reduced instruction set computer only memory access in load/store instructions all operands must be in registers otherwise 4 instructions to do single x86 instruction above 25

59 CISC problems CISC introduces many problems complex instructions take longer cause pipeline cycle to be slower harder to decode more on that in a minute compilers too stupid to use most fancy instrs array accessing is an exception hardware too hard/expensive/flaky to design 26

60 RISC in CISC clothing x86 designers understand this problem x86 core is really RISC no complex instructions all operands in registers no fancy addressing modes decode generates micro-instructions look just like RISC instructions 27

61 micro-instructions look at one from earlier worksheet leal -12(%ebp),%eax incl (%eax) becomes 4 micro-instructions add $12,%ebp,%eax load %eax,regx add $1,REGX,REGX store %eax,regx needs extra register makes decode even harder 28

62 decoding there is a problem with long pipelines short cycle times, but there is a cost long decodes makes in worse original pentium 4 had 9 steps in decode what happens on a branch? 29

63 branches look at code cmpl $2,%eax FET je L1 addl %eax,%edx subl $3,%edx 30

64 branches look at code cmpl $2,%eax FET DEC je L1 FET addl %eax,%edx subl $3,%edx 30

65 branches look at code cmpl $2,%eax FET DEC OPF je L1 FET DEC addl %eax,%edx FET subl $3,%edx 30

66 branches look at code cmpl $2,%eax FET DEC OPF EXEC je L1 FET DEC OPF addl %eax,%edx FET DEC subl $3,%edx FET 30

67 branches look at code cmpl $2,%eax FET DEC OPF EXEC je L1 FET DEC OPF addl %eax,%edx FET DEC subl $3,%edx FET WB EXEC OPF DEC 30

68 branches look at code cmpl $2,%eax FET DEC OPF EXEC je L1 FET DEC OPF addl %eax,%edx FET DEC subl $3,%edx FET WB find new PC EXEC OPF DEC 30

69 branches look at code cmpl $2,%eax FET DEC OPF EXEC je L1 FET DEC OPF addl %eax,%edx FET DEC subl $3,%edx FET WB find new PC EXEC OPF DEC 30

70 branch penalty every branch caused cycle delay almost as bad as memory access all processors use branch prediction guess where branch will go based on previous execution and/or compiler hints no penalty if correct full penalty if wrong current technology right 90+% overall 31

71 superscalar Pentium I was pipelined many more transistors available now what to do with them? how about multiple parts multiple ALU s multiple decoders called superscalar processor 32

72 superscalar modern processor has 2-4 decode pipelines all the same (or almost the same) finish decoding 2-4 instructions per cycle all execute in own ALU in parallel 2-4 times faster if no stalls and branch prediction is perfect makes writing good assembler much harder compilers becoming much more sophisticated 33

73 no superscalar movl FET DEC OPF EXE WB sarl FET DEC OPF EXE WB addl FET DEC OPF EXE WB subl FET DEC OPF EXE WB takes 4 cycles ignoring time to fill pipeline 34

74 superscalar movl FET DEC OPF EXE WB sarl FET DEC OPF EXE WB addl FET DEC OPF EXE WB subl FET DEC OPF EXE WB but now has stalls 35

75 superscalar movl FET DEC OPF EXE WB sarl FET DEC OPF EXE WB addl FET DEC OPF EXE WB subl FET DEC OPF EXE WB but now has stalls 35

76 superscalar movl FET DEC OPF EXE WB sarl FET DEC OPF EXE WB addl FET DEC OPF EXE WB subl FET DEC OPF EXE WB but now has stalls 35

77 superscalar movl FET DEC OPF EXE WB sarl FET DEC OPF EXE WB addl FET DEC stall OPF EXE WB subl FET DEC stall OPF EXE WB with stalls takes 3 cycles after initial pipeline fill 36

78 transistors everywhere moore s law means smaller transistors and each one is faster if all else even, faster transistors = faster cpu and more power hungry cpu fortunately smaller transistors use less power high end processors were eating about 100W and have for more than a decade had been slowly getting worse 37

79 faster or smaller can either use extra transistors to make faster processors make smaller (cheaper) processors intel (et al) want maximum total revenue either more expensive processors or sell more x86 sold mostly for real computers not a high growth market now so need to justify expensive processors 38

80 need for speed way to justify $ is faster processor pipelined early 90 s work on multiple instructions at once broken up by phase of execution 3-4X performance improvement limited by branching longer pipeline faster but worse problems with branching 39

81 need for speed (2) superscalar mid- to late-90 s work on multiple instructions at once same phase adds extra decoders, ALUs,... +/- 50% performance improvement 40

82 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall 41

83 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl produces value 41

84 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl produces value addl uses value 41

85 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB addl 41

86 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB addl FET DEC OPF EXE WB 41

87 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB addl FET DEC OPF EXE WB 41

88 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB addl FET DEC OPF EXE WB 41

89 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB ext1 FET DEC OPF EXE WB addl 41

90 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB ext1 FET DEC OPF EXE WB addl FET DEC OPF EXE WB 41

91 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB ext1 FET DEC OPF EXE WB addl FET DEC OPF EXE WB 41

92 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB ext1 FET DEC OPF EXE WB addl FET DEC OPF EXE WB 41

93 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB ext1 FET DEC OPF EXE WB ext2 FET DEC OPF EXE WB ext3 FET DEC OPF EXE WB addl FET DEC OPF EXE WB 41

94 (not so) superscalar limited by stalls number of instructions between calc and usage for 5 stage pipeline needs 2-3 extra instructions to not stall sarl FET DEC OPF EXE WB ext1 FET DEC OPF EXE WB ext2 FET DEC OPF EXE WB ext3 FET DEC OPF EXE WB addl FET DEC OPF EXE WB forwarding can now work 41

95 out of order compilers try to schedule instructions to avoid stalls most CPUs now allow out of order execution if an instruction stalls one behind it may pass it in line but only if there are no dependencies somewhat controversial takes transistors (= power) could be done by compiler for no power 42

96 ILP both pipelining and superscalar use Instruction Level Parallelism executing multiple instructions in parallel from same program or more precisely, same thread of control little additional improvement there because of structure of typical code 43

97 multi-processing already pushing instructions as fast as possible and executing many instructions at once from a single process only thing left is to execute multiple processes at once called multi-processing 44

98 multi-processing limits multi-processing requires sw support some programs can do multiple things at once multi-threaded programs apache, photoshop,... next gen games starting to be multi-threaded OS can multi-process different programs mp3 player vs mail reader vs eclipse vs... 45

99 on the cheap already have multiple decoders and ALUs can do multiple things at once but a single process does not have enough to execute 2 programs at once, we need second register set including PC basis of Intel HyperThreading and other similar technologies from competitors 46

100 HyperThreading suppose we had 3 decoders, 3 ALUs,... and 2 register sets (including PCs) on average, single process uses 1.5 instrs/cycle if it has 2 decoders, ALUs,... one stalls for a cycle OR both stall every other cycle sharing matches well although sometimes both stall at once or both want 2 at once adding extra stalls could get +/- 2.5 instructions per cycle 47

101 HyperThreading problem both processes share cache competing for that resource as well may not co-exist well tends to work well for many multi-threaded not as well for arbitrary multi-processing in worst case, may be slower than single cache misses are VERY expensive definitely limits gain 48

102 more HT problems stalls are bad for performance but good for power/heat giving parts cycles off gives them a chance to cool hyperthreading works each transistor harder may generate 40% more heat than not also security hole discovered can determine what other thread is doing at least partially from cache changes clever program can determine crypto key 49

103 multiple cores multiple cores replicate entire cpu path from decoder through registers, even caches almost like putting multiple cpu chips in box but fits in one socket also usually shares access to FSB/BSB may aggravate memory bus contention for poorly cached programs 50

104 multi-core advantages multi-core is easy to design just stick 2+ cores on one chip no new work there gets good cooling/power usage can get 2X performance gain for 2 cores assuming two processes waiting to run 51

105 cache coherence problem is with L1 caches now have 2 copies potentially of same data processor A could write address X then processor B could read address X but from 2 different caches B could see wrong answer if not careful called cache coherence problem 52

106 snoopy cache coherence well studied needed for any multi-processing system several approaches defined most common is called snoopy each cache snoops on the others watches r/w to cache works well on single chip multi-processing as long as it does not interfere 53

107 single writer alternative is to allow either many readers of a memory location or a single writer once a cache is written all others invalidate their line only need to check other caches on miss or never if write back cache 54

108 single writer single writer much easier for multi-chip hard to watch cpu/l1 interface at a distance can be implemented so caches own lines when they are writing tracking ownership outside any L1 cache held with first level of shared memory L2 for multi-core main memory in fully distributed called directory protocol in that case 55

109 HT vs multi-core multi-core is clearly superior to HT but costs a lot more in transistors and $ can use both HT more appropriate for multi-threaded programs multi-core used for multi-processing i7 does this look at i7 and gpu s next week 56

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? exam on Wednesday today s material not on the exam 1 Assembly Assembly is programming

More information

Pipelining. Principles of pipelining Pipeline hazards Remedies. Pre-soak soak soap wash dry wipe. l Chapter 4.4 and 4.5

Pipelining. Principles of pipelining Pipeline hazards Remedies. Pre-soak soak soap wash dry wipe. l Chapter 4.4 and 4.5 Pipelining Pre-soak soak soap wash dry wipe Chapter 4.4 and 4.5 Principles of pipelining Pipeline hazards Remedies 1 Multi-stage process Sequential execution One process begins after previous finishes

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

More advanced CPUs. August 4, Howard Huang 1

More advanced CPUs. August 4, Howard Huang 1 More advanced CPUs In the last two weeks we presented the design of a basic processor. The datapath performs operations on register and memory data. A control unit translates program instructions into

More information

IF1 --> IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB. add $10, $2, $3 IF1 IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB sub $4, $10, $6 IF1 IF2 ID1 ID2 --> EX1 EX2 ME1 ME2 WB

IF1 --> IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB. add $10, $2, $3 IF1 IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB sub $4, $10, $6 IF1 IF2 ID1 ID2 --> EX1 EX2 ME1 ME2 WB EE 4720 Homework 4 Solution Due: 22 April 2002 To solve Problem 3 and the next assignment a paper has to be read. Do not leave the reading to the last minute, however try attempting the first problem below

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

Outline. What Makes a Good ISA? Programmability. Implementability

Outline. What Makes a Good ISA? Programmability. Implementability Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

Processor Architecture

Processor Architecture ECPE 170 Jeff Shafer University of the Pacific Processor Architecture 2 Lab Schedule Ac=vi=es Assignments Due Today Wednesday Apr 24 th Processor Architecture Lab 12 due by 11:59pm Wednesday Network Programming

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

EE 4980 Modern Electronic Systems. Processor Advanced

EE 4980 Modern Electronic Systems. Processor Advanced EE 4980 Modern Electronic Systems Processor Advanced Architecture General Purpose Processor User Programmable Intended to run end user selected programs Application Independent PowerPoint, Chrome, Twitter,

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 8: Introduction to code generation Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 What is a Compiler? Compilers

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

4) C = 96 * B 5) 1 and 3 only 6) 2 and 4 only

4) C = 96 * B 5) 1 and 3 only 6) 2 and 4 only Instructions: The following questions use the AT&T (GNU) syntax for x86-32 assembly code, as in the course notes. Submit your answers to these questions to the Curator as OQ05 by the posted due date and

More information

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently?

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently? Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

Photo David Wright STEVEN R. BAGLEY PIPELINES AND ILP

Photo David Wright   STEVEN R. BAGLEY PIPELINES AND ILP Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks

More information

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation Compiler Construction SMD163 Lecture 8: Introduction to code generation Viktor Leijon & Peter Jonsson with slides by Johan Nordlander Contains material generously provided by Mark P. Jones What is a Compiler?

More information

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor Lecture 12 Architectures for Low Power: Transmeta s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Code Optimization. What is code optimization?

Code Optimization. What is code optimization? Code Optimization Introduction What is code optimization Processor development Memory development Software design Algorithmic complexity What to optimize How much can we win 1 What is code optimization?

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Lecture 15: Pipelining. Spring 2018 Jason Tang

Lecture 15: Pipelining. Spring 2018 Jason Tang Lecture 15: Pipelining Spring 2018 Jason Tang 1 Topics Overview of pipelining Pipeline performance Pipeline hazards 2 Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time T a s k O r d e r A B C D 30 40 20

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add; CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the

More information

RISC I from Berkeley. 44k Transistors 1Mhz 77mm^2

RISC I from Berkeley. 44k Transistors 1Mhz 77mm^2 The Case for RISC RISC I from Berkeley 44k Transistors 1Mhz 77mm^2 2 MIPS: A Classic RISC ISA Instructions 4 bytes (32 bits) 4-byte aligned Instructions operate on memory and registers Memory Data types

More information

Communications and Computer Engineering II: Lecturer : Tsuyoshi Isshiki

Communications and Computer Engineering II: Lecturer : Tsuyoshi Isshiki Communications and Computer Engineering II: Microprocessor 2: Processor Micro-Architecture Lecturer : Tsuyoshi Isshiki Dept. Communications and Computer Engineering, Tokyo Institute of Technology isshiki@ict.e.titech.ac.jp

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

Parallelism, Multicore, and Synchronization

Parallelism, Multicore, and Synchronization Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer, Roth, Martin] xkcd/619 3 Big Picture: Multicore

More information

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several

More information

Overview of the MIPS Architecture: Part I. CS 161: Lecture 0 1/24/17

Overview of the MIPS Architecture: Part I. CS 161: Lecture 0 1/24/17 Overview of the MIPS Architecture: Part I CS 161: Lecture 0 1/24/17 Looking Behind the Curtain of Software The OS sits between hardware and user-level software, providing: Isolation (e.g., to give each

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

Multicore and Parallel Processing

Multicore and Parallel Processing Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement

More information

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4 IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College Pipelining: Overview CPSC 252 Computer Organization Ellen Walker, Hiram College Pipelining the Wash Divide into 4 steps: Wash, Dry, Fold, Put Away Perform the steps in parallel Wash 1 Wash 2, Dry 1 Wash

More information

Sample Exam I PAC II ANSWERS

Sample Exam I PAC II ANSWERS Sample Exam I PAC II ANSWERS Please answer questions 1 and 2 on this paper and put all other answers in the blue book. 1. True/False. Please circle the correct response. a. T In the C and assembly calling

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

CS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017

CS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017 CS 31: Intro to Systems ISAs and Assembly Martin Gagné Swarthmore College February 7, 2017 ANNOUNCEMENT All labs will meet in SCI 252 (the robot lab) tomorrow. Overview How to directly interact with hardware

More information

Pipeline: Introduction

Pipeline: Introduction Pipeline: Introduction These slides are derived from: CSCE430/830 Computer Architecture course by Prof. Hong Jiang and Dave Patterson UCB Some figures and tables have been derived from : Computer System

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator).

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator). Microprocessors Von Neumann architecture The first computers used a single fixed program (like a numeric calculator). To change the program, one has to re-wire, re-structure, or re-design the computer.

More information

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor Crusoe Reference Thinking Outside the Box The Transmeta Crusoe Processor 55:132/22C:160 High Performance Computer Architecture The Technology Behind Crusoe Processors--Low-power -Compatible Processors

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

Pipelining. Parts of these slides are from the support material provided by W. Stallings

Pipelining. Parts of these slides are from the support material provided by W. Stallings Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

CS356 Unit 12a. Logic Circuits. Combinational Logic Gates BASIC HW. Processor Hardware Organization Pipelining

CS356 Unit 12a. Logic Circuits. Combinational Logic Gates BASIC HW. Processor Hardware Organization Pipelining 2a. 2a.2 CS356 Unit 2a Processor Hardware Organization Pipelining BASIC HW Logic Circuits 2a.3 Combinational Logic Gates 2a.4 logic Performs a specific function (mapping of input combinations to desired

More information

Instruction Set Architecture

Instruction Set Architecture CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Instruction Set Architecture Assembly Language View! Processor

More information

Instruction Set Architecture

Instruction Set Architecture CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Instruction Set Architecture Assembly Language View Processor

More information

Second Part of the Course

Second Part of the Course CSC 2400: Computer Systems Towards the Hardware 1 Second Part of the Course Toward the hardware High-level language (C) assembly language machine language (IA-32) 2 High-Level Language g Make programming

More information

CS311 Lecture: Pipelining and Superscalar Architectures

CS311 Lecture: Pipelining and Superscalar Architectures Objectives: CS311 Lecture: Pipelining and Superscalar Architectures Last revised July 10, 2013 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as a result

More information

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise CSCI 4717/5717 Computer Architecture Topic: Instruction Level Parallelism Reading: Stallings, Chapter 14 What is Superscalar? A machine designed to improve the performance of the execution of scalar instructions.

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II 2007-8-1 Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance

More information

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 RISC-V Pipeline Pipeline Control Hazards Structural Data R-type

More information

CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation

CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 5th Edition, Irv Englander John

More information

Assembly I: Basic Operations. Jo, Heeseung

Assembly I: Basic Operations. Jo, Heeseung Assembly I: Basic Operations Jo, Heeseung Moving Data (1) Moving data: movl source, dest Move 4-byte ("long") word Lots of these in typical code Operand types Immediate: constant integer data - Like C

More information

Lecture 40 - x86 Architecture. www-inst.eecs.berkeley.edu/~cs61c/

Lecture 40 - x86 Architecture. www-inst.eecs.berkeley.edu/~cs61c/ CS61C Machine Structures Lecture 40 - x86 Architecture 12/5/2007 John Wawrzynek (www.cs.berkeley.edu/~johnw) www-inst.eecs.berkeley.edu/~cs61c/ 1 Outline History of Intel x86 line. MIPS versus x86 Unusual

More information

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung ASSEMBLY I: BASIC OPERATIONS Jo, Heeseung MOVING DATA (1) Moving data: movl source, dest Move 4-byte ("long") word Lots of these in typical code Operand types Immediate: constant integer data - Like C

More information

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016 CS 31: Intro to Systems ISAs and Assembly Kevin Webb Swarthmore College February 9, 2016 Reading Quiz Overview How to directly interact with hardware Instruction set architecture (ISA) Interface between

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018 CS 31: Intro to Systems ISAs and Assembly Kevin Webb Swarthmore College September 25, 2018 Overview How to directly interact with hardware Instruction set architecture (ISA) Interface between programmer

More information

RAČUNALNIŠKEA COMPUTER ARCHITECTURE

RAČUNALNIŠKEA COMPUTER ARCHITECTURE RAČUNALNIŠKEA COMPUTER ARCHITECTURE 6 Central Processing Unit - CPU RA - 6 2018, Škraba, Rozman, FRI 6 Central Processing Unit - objectives 6 Central Processing Unit objectives and outcomes: A basic understanding

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

Processor Performance and Parallelism Y. K. Malaiya

Processor Performance and Parallelism Y. K. Malaiya Processor Performance and Parallelism Y. K. Malaiya Processor Execution time The time taken by a program to execute is the product of n Number of machine instructions executed n Number of clock cycles

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Assembly Language: Overview!

Assembly Language: Overview! Assembly Language: Overview! 1 Goals of this Lecture! Help you learn:" The basics of computer architecture" The relationship between C and assembly language" IA-32 assembly language, through an example"

More information

Final Exam Fall 2007

Final Exam Fall 2007 ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

12.1. CS356 Unit 12. Processor Hardware Organization Pipelining

12.1. CS356 Unit 12. Processor Hardware Organization Pipelining 12.1 CS356 Unit 12 Processor Hardware Organization Pipelining BASIC HW 12.2 Inputs Outputs 12.3 Logic Circuits Combinational logic Performs a specific function (mapping of 2 n input combinations to desired

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Tutorial 11. Final Exam Review

Tutorial 11. Final Exam Review Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache

More information