CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

Size: px
Start display at page:

Download "CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago"

Transcription

1 CMSC Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago

2 Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award lecture Tomorrow 2

3 Lecture Outlie Pipeliig basics ad discussios No-ideal pipelie 3

4 Sigle Cycle uarch: Datapath & Cotrol **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 4

5 Sigle Cycle uarch: Summary Iefficiet All istructios ru as slow as the slowest istructio Not ecessarily the simplest way to implemet a ISA Sigle-cycle implemetatio of REP MOVS (x86)? Not easy to optimize/improve performace Optimizig the commo case (e.g. commo istructios) does ot work Need to optimize the worst case all the time All resources are ot fully utilized e.g., data memory access ca t overlap with ALU operatio How to do better? 5

6 Sigle-Cycle, Multi-Cycle, Pipeliig Sigle-cycle: 1 cycle per istructio, log cycle time F D E M W F D E M W Multi-cycle: 5 cycles per istructio, short cycle time F D E M W F D E M W F D E M W Pipelie: 1 cycle per istructio (steady state), short cycle time F D E M W F D E M W F D E M W F D E M W Time 6

7 Istructio Pipeliig: Basic Idea Pipelie the executio of multiple istructios Idea: Divide the istructio processig ito distict stages of processig Esure there are eough hardware resources to process oe istructio i each stage Process a differet istructio i each stage Istructios cosecutive i program order are processed i cosecutive stages Beefit: Icreases istructio processig throughput Dowside: Start thikig about this 7

8 Pipeliig Istructio Processig 8

9 Remember: Istructio Processig Steps 1. Istructio fetch (IF) 2. Istructio decode ad register operad fetch (ID/RF) 3. Execute/Evaluate memory address (EX/AG) 4. Memory operad fetch (MEM) 5. Store/writeback result (WB) 9

10 Remember the Sigle-Cycle Uarch Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 10

11 Pipelie Operatio Examples We ll look at load & store Show pipelie usage i a sigle cycle Highlight resources used 11

12 Review: LEGv8 Sigle-Cycle Datapath **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 12

13 Addig Pipelie Registers Registers betwee stages to hold iformatio produced i previous cycle Imm E B M AoutW BE IR D PC D A E PC E Aout M PC M MDR W **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 13

14 IF for Load, Store, Cycle 1 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 14

15 ID for Load, Store, Cycle 2 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 15

16 EX for Load Cycle 3 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 16

17 MEM for Load Cycle 4 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 17

18 WB for Load Cycle 5 Wrog register umber **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 18

19 Corrected Datapath for Load Cycle 5 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 19

20 EX for Store **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 20

21 MEM for Store **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 21

22 WB for Store **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 22

23 Pipelie Operatio Examples Cosider the followig istructio seueces LDUR X10, [X1, 40] SUB X11, X2, X3 ADD X12, X3, X4 LDUR X13, [X1, 48] ADD X14, X5, X6 23

24 Fillig up the Pipelie **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 24

25 Pipelie: Steady State State of pipelie at the 5th cycles **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 25

26 Illustratig Pipelie Operatio: Operatio View t 0 t 1 t 2 t 3 t 4 t 5 Ist 0 Ist 1 Ist 2 Ist 3 Ist 4 IF ID IF EX ID IF MEM EX ID IF WB MEM EX ID IF steady state (full pipelie) WB MEM EX ID IF WB MEM EX ID IF WB MEM EX ID IF 26

27 Illustratig Pipelie Operatio: Resource View t 0 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 IF I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 ID I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 EX I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 MEM I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 WB I 0 I 1 I 2 I 3 I 4 I 5 I 6 27

28 Pipelied Cotrol Idetical set of cotrol poits as the sigle-cycle uarch!! **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 28

29 Pipelied Cotrol Cotrol sigals derived from istructio Decode oce as i sigle-cycle implemetatio Buffer sigals util cosumed What other optios are there to derive pipelie cotrol sigals? **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 29

30 Pipelied Cotrol + Datapath Note: 1. Reg2Loc==0: istructio[20:16] is selected; ad Reg2Loc==1: istructio[4:0] is selected; 2. istructio[9:5] is the iput to Read register1 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 30

31 Performace Aalysis 31

32 Termiologies ad Defiitios CPI: cycle per istructio IPC: istructio per cycle, which is 1/CPI Executio time of a istructio {CPI} x {clock cycle time} Executio time of a program Iro Law Sum over all istructios [ {CPI} x {clock cycle time} ] {# of istructios} x {average CPI} x {clock cycle time} 32

33 Examples Remember: executio time of a program Sum over all istructios [ {CPI} x {clock cycle time} ] {# of istructios} x {average CPI} x {clock cycle time} Sigle-cycle uarch CPI = 1, but clock cycle time is log Multi-cycle uarch (with 5 stages) CPI = 5, but clock cycle time is short Pipelied uarch (with 5 stages) CPI = 1 (steady state), clock cycle time same with multi-cycle This is the ideal case 33

34 Pipeliig: Discussios 34

35 Pipelied uarch Is this a good partitioig? Why ot 4 or 6 stages? Why ot differet boudaries? **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 35

36 Pipelie Cosideratios How to partitio? How may stages? 36

37 Pipelie Partitioig: Resource Reuiremet The goal: o shared resources amog differet pipelie stages i.e., No resource is used by more tha 1 stage Otherwise, we have resource cotetio or structural hazard Example: eed to be able to fetch istructios (i IF stage) ad load data (i MEM stage) at the same time Sigle memory iterface ot sufficiet Solutio 1: provide two separate iterfaces via istructio ad data caches Solutio 2:?? 37

38 How May Pipelie Stages? BW (badwidth), a.k.a. throughput (1/ cycle time) Ideally, seuetial elemets (pipelie registers) do ot impose additioal delays/cost combiatioal logic (F,D,E,M,W) T ps BW=~(1/T) T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T) T/3 ps (F,D) T/3 ps (E,M) T/3 ps (M,W) BW=~(3/T) 38

39 Pipelie Stages ad Impact o Performace Nopipelied versio with delay T BW = 1/(T+S) where S = seuetial elemet delay T ps k-stage pipelied versio BW k-stage = 1 / (T/k +S ) BW max = 1 / (1 gate delay + S ) Seuetial elemet delay reduces BW (switchig overhead betwee stages) T/k ps T/k ps 39

40 Pipelie Stages ad Impact o HW Cost Nopipelied versio with combiatioal cost G Cost = G+L where L = seuetial elemet cost G gates k-stage pipelied versio Cost k-stage ~= G + Lk Seuetial elemets icrease hardware cost G/k G/k It is critical to balace the tradeoffs i.e., how may stages ad what is doe i each stage 40

41 Ideal vs. No Ideal Pipelies 41

42 Properties of A Ideal Pipelie Goal: Icrease throughput with little icrease i cost (hardware cost, i case of istructio processig) Repetitio of idetical operatios The same operatio is repeated o a large umber of differet iputs (e.g., all laudry loads go through the same steps) Uiformly partitioable suboperatios Processig a be evely divided ito uiform-latecy suboperatios (that do ot share resources) Repetitio of idepedet operatios No depedecies betwee repeated operatios Ca you implemet a ideal pipelie for istructio processig? 42

43 Istructio Pipelie: Not Ideal Idetical operatios... NOT! Þ differet istructios à ot all eed the same stages Forcig differet istructios to go through the same pipe stages à exteral fragmetatio (some pipe stages idle for some istructios) Uiform suboperatios... NOT! Þ differet pipelie stages à ot the same latecy Need to force each stage to be cotrolled by the same clock à iteral fragmetatio (some pipe stages are too fast but all take the same clock cycle time) Idepedet operatios... NOT! Þ istructios are ot idepedet of each other Need to detect ad resolve iter-istructio depedecies to esure the pipelie provides correct results à pipelie stalls (pipelie is ot always movig) 43

44 Istructio Pipelie: Not Ideal Idetical operatios... NOT! Þ differet istructios à ot all eed the same stages Forcig differet istructios to go through the same pipe stages à exteral fragmetatio (some pipe stages idle for some istructios) Examples Add, Brach: o eed to go through the MEM stage Others? Performace impact? 44

45 Istructio Pipelie: Not Ideal Uiform suboperatios... NOT! Þ differet pipelie stages à ot the same latecy Need to force each stage to be cotrolled by the same clock à iteral fragmetatio (some pipe stages are too fast but all take the same clock cycle time) 45

46 No-Uiform Operatios: Laudry Aalogy Time Task order A B C D 6 PM AM Time 6 PM AM Task order A B C D Based o origial figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] the slowest step decides throughput or cycle time 46

47 No-Uiform Operatios: Example 200ps 100ps 200ps 200ps 100ps Imm E B M AoutW BE IR D PC D A E PC E Aout M PC M MDR W Based o origial figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 47

48 No-Uiform Operatios: Example Program executio order Time (i istructios) lw $1, 100($0) Istructio fetch Reg ALU Data access Reg lw $2, 200($0) 800ps 8 s Istructio fetch Reg ALU Data access Reg lw $3, 300($0) Program executio Time order (i istructios) lw $1, 100($0) Istructio fetch 800ps 8 s Reg ALU Data access Reg Istructio fetch 800ps 8 s... lw $2, 200($0) 200ps 2 s Istructio fetch Reg ALU Data access Reg lw $3, 300($0) 200ps 2 s Istructio fetch Reg ALU Data access Reg 200ps 200ps 200ps 200ps 200ps 2 s 2 s 2 s 2 s 2 s 48

49 Istructio Pipelie: Not Ideal Idepedet operatios... NOT! Þ istructios are ot idepedet of each other Need to detect ad resolve iter-istructio depedecies to esure the pipelie provides correct results à pipelie stalls (pipelie is ot always movig) 49

50 Depedecies ad Their Types Also called hazards Two types Data depedecy Cotrol depedecy 50

51 Data Depedecy Hadlig 51

52 Data Depedecy Types Flow depedecy r 3 r 1 op r 2 Read-after-Write (RAW) r 5 r 3 op r 4 Ati depedecy r 5 r 3 op r 4 Write-after-Read (WAR) r 3 r 6 op r 7 Output-depedecy r 3 r 1 op r 2 Write-after-Write (WAW) r 5 r 3 op r 4 r 3 r 6 op r 7 52

53 Data Depedecy Types Flow depedecies always eed to be obeyed because they costitute true depedece o a value Ati ad output depedecies exist due to limited umber of architectural registers They are depedece o a ame, ot a value Ati ad output depedeces are easier to hadle Write to the destiatio i oe stage ad i program order Flow depedeces are more iterestig 53

54 Ways of Hadlig Flow Depedecies Detect ad wait util value is available i register file Detect ad forward/bypass data to depedet istructio Detect ad elimiate the depedece at the software level No eed for the hardware to detect depedece Predict the eeded value(s), execute speculatively, ad verify Do somethig else (fie-graied multithreadig) No eed to detect 54

55 Flow Depedecy Example Cosider this seuece: SUB X2, X1,X3 AND X12,X2,X5 OR X13,X2,X6 ADD X14,X2,X2 STUR X15,[X2,#100]

56 Flow Depedecy Example Time SUB X2, X1, X3 IF ID EX MEM WB AND X12, X2, X5 IF ID EX MEM WB OR ADD X13, X2, X6 X14, X2, X2 IF ID EX MEM? IF ID EX STUR X15, [X2, #100] IF ID SUB writig to X2 ad ADD readig from it i the same cycle Assume iteral forwardig i register file i.e., ADD gets the ew X2 value produced from SUB 56

57 How to Detect Flow Depedecies i HW? R/I-Type LDUR STUR B IF ID read RF read RF read RF EX MEM WB write RF write RF Istructios I A ad I B (where I A comes before I B ) have RAW depedecy iff ad I B (R/I, LDUR, or STUR) reads a register writte by I A (R/I or LDUR) dist(i A, I B ) < dist(id, WB) = 3 57

58 Flow Depedecy Check Logic Helper fuctios Op1(I) ad Op2(I) returs the 1 st ad 2 d register operad field of I, respectively Use_Op1(I) returs true if I reuires the 1 st register operads ad the register is ot X31; similarly for Use_Op2(I) Flow depedecy occurs whe or or or (Op1(IR ID )==dest EX ) && use_op1(ir ID ) && RegWrite EX (Op1(IR ID )==dest MEM ) && use_op1(ir ID ) && RegWrite MEM (Op2(IR ID )==dest EX ) && use_op2(ir ID ) && RegWrite EX (Op2(IR ID )==dest MEM ) && use_op2(ir ID ) && RegWrite MEM 58

59 Resolvig Data Depedece Optio 1: Stall the pipelie (i.e., Isertig bubbles ) t 0 t 1 t 2 t 3 t 4 t 5 Ist h IF ID ALU MEM WB Ist i i IF ID ALU MEM WB Ist j j IF ID ALU ID MEM ALU ID WB MEM ALU WB MEM Ist k IF ID IF ALU ID IF MEM ALU ID WB MEM ALU Ist l IF ID IF ALU ID IF MEM ALU ID IF ID IF ALU ID i: r x _ j: bubble _ r IF ID x dist(i,j)=1 IF Stall = make the depedet istructio j: bubble _ r x dist(i,j)=2 IF j: _ r x dist(i,j)=3 wait util its source data value is available 1. stop all up-stream stages 2. drai all dow-stream stages 59

60 Resolvig Data Depedece Optio 1: Stall the pipelie (i.e., Isertig bubbles ) t 0 t 1 t 2 t 3 t 4 t 5 Ist h IF ID ALU MEM WB Ist i i i IF ID ALU MEM WB Ist Bubble j (op) j IF ID j ALU ID MEM ALU WB MEM WB Ist Bubble k (op) IF k IF ID j ALU ID MEM ALU WB MEM Ist lj j IF k IF ID j ID ALU ALU ID MEM Ist k k IF IF ID i: r x _ k ID ALU j: bubble _ r IF ID x dist(i,j)=1 IF j: _ r x dist(i,j)=2 IF 60

Design of Digital Circuits Lecture 14: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2018

Design of Digital Circuits Lecture 14: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2018 Desig of Digital Circuits Lecture 4: Pipeliig Prof. Our Mutlu ETH Zurich Sprig 28 9 April 28 Ageda for Today & Next Few Lectures Previous lectures Sigle-cycle Microarchitectures Multi-cycle ad Microprogrammed

More information

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining Prof. Yanjing Li University of Chicago Administrative Stuff! Lab1 due at 11:59pm today! Lab2 out " Pipeline ARM simulator "

More information

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 3: ISA ad Itroductio to Microarchitecture Prof. Yajig Li Uiversity of Chicago Lecture Outlie ISA uarch (hardware implemetatio of a ISA) Logic desig basics Sigle-cycle

More information

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines This Uit: Damic Schedulig CSE 560 Computer Sstems Architecture Damic Schedulig Slides origiall developed b Drew Hilto (IBM) ad Milo Marti (Uiversit of Peslvaia) App App App Sstem software Mem CPU I/O Code

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors CS252 Sprig 2017 Graduate Computer Architecture Lecture 6: Out-of-Order Processors Lisa Wu, Krste Asaovic http://ist.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 2 WU UCB CS252 SP17 Last Time i Lecture

More information

Arquitectura de Computadores

Arquitectura de Computadores Arquitectura de Computadores Capítulo 2. Procesadores segmetados Based o the origial material of the book: D.A. Patterso y J.L. Heessy Computer Orgaizatio ad Desig: The Hardware/Software Iterface 4 th

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Chapter 4 The Datapath

Chapter 4 The Datapath The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 4 The Processor Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler CPI ad Cycle time Determied

More information

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Multi-Core Prof. Yajig Li Uiversity of Chicago Course Evaluatio Very importat Please fill out! 2 Lab3 Brach Predictio Competitio 8 teams etered the competitio,

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Multiprocessors. HPC Prof. Robert van Engelen

Multiprocessors. HPC Prof. Robert van Engelen Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies

More information

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Review Istructio Set Architecture Istructio Set The repertoire of istructios of a computer Differet computers have differet istructio

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

Lecture 3. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Lecture 3. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram Lecture 3 RTL Desig Methodology Trasitio from Pseudocode & Iterface to a Correspodig Block Diagram Structure of a Typical Digital Data Iputs Datapath (Executio Uit) Data Outputs System Cotrol Sigals Status

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1 Switchig Hardware Sprig 208 CS 438 Staff, Uiversity of Illiois Where are we? Uderstad Differet ways to move through a etwork (forwardig) Read sigs at each switch (datagram) Follow a kow path (virtual circuit)

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

Instruction and Data Streams

Instruction and Data Streams Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

Recursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames

Recursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames Uit 4, Part 3 Recursio Computer Sciece S-111 Harvard Uiversity David G. Sulliva, Ph.D. Review: Method Frames Whe you make a method call, the Java rutime sets aside a block of memory kow as the frame of

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO Sagwo Seo, Trevor Mudge Advaced Computer Architecture Laboratory Uiversity of Michiga at A Arbor {swseo, tm}@umich.edu Yumig Zhu, Chaitali

More information

Uniprocessors. HPC Prof. Robert van Engelen

Uniprocessors. HPC Prof. Robert van Engelen Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures

More information

Lecture 19 Introduction to Pipelining

Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) 1 Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) Basic pipelining basic := single, in-order issue single issue one instruction at

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

SPIRAL DSP Transform Compiler:

SPIRAL DSP Transform Compiler: SPIRAL DSP Trasform Compiler: Applicatio Specific Hardware Sythesis Peter A. Milder (peter.milder@stoybroo.edu) Fraz Frachetti, James C. Hoe, ad Marus Pueschel Departmet of ECE Caregie Mello Uiversity

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

Design of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018

Design of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018 Desig of Digital Circuits Lecture 16: Out-of-Order Executio Prof. Our Mutlu ETH Zurich Sprig 2018 26 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures Multi-cycle ad Microprogrammed

More information

CMPT 125 Assignment 2 Solutions

CMPT 125 Assignment 2 Solutions CMPT 25 Assigmet 2 Solutios Questio (20 marks total) a) Let s cosider a iteger array of size 0. (0 marks, each part is 2 marks) it a[0]; I. How would you assig a poiter, called pa, to store the address

More information

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2005-9-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/ Office Hours

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 201 Heaps 201 Goodrich ad Tamassia xkcd. http://xkcd.com/83/. Tree. Used with permissio uder

More information

Lecture 1: Introduction and Fundamental Concepts 1

Lecture 1: Introduction and Fundamental Concepts 1 Uderstadig Performace Lecture : Fudametal Cocepts ad Performace Aalysis CENG 332 Algorithm Determies umber of operatios executed Programmig laguage, compiler, architecture Determie umber of machie istructios

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

The Simeck Family of Lightweight Block Ciphers

The Simeck Family of Lightweight Block Ciphers The Simeck Family of Lightweight Block Ciphers Gagqiag Yag, Bo Zhu, Valeti Suder, Mark D. Aagaard, ad Guag Gog Electrical ad Computer Egieerig, Uiversity of Waterloo Sept 5, 205 Yag, Zhu, Suder, Aagaard,

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

(Basic) Processor Pipeline

(Basic) Processor Pipeline (Basic) Processor Pipeline Nima Honarmand Generic Instruction Life Cycle Logical steps in processing an instruction: Instruction Fetch (IF_STEP) Instruction Decode (ID_STEP) Operand Fetch (OF_STEP) Might

More information

Computer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017

Computer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017 Computer Architecture Lecture 8: SIMD Processors ad GPUs Prof. Our Mutlu ETH Zürich Fall 2017 18 October 2017 Ageda for Today & Next Few Lectures SIMD Processors GPUs Itroductio to GPU Programmig Digitaltechik

More information

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 10 Defiig Classes Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 10.1 Structures 10.2 Classes 10.3 Abstract Data Types 10.4 Itroductio to Iheritace Copyright 2015 Pearso Educatio,

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea 5-1 Chapter 5 Processor Desig Advaced Topics Chapter 5: Processor Desig Advaced Topics Topics 5.3 Microprogrammig Cotrol store ad microbrachig Horizotal ad vertical microprogrammig 5- Chapter 5 Processor

More information

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram Lecture 2 RTL Desig Methodology Trasitio from Pseudocode & Iterface to a Correspodig Block Diagram Structure of a Typical Digital Data Iputs Datapath (Executio Uit) Data Outputs System Cotrol Sigals Status

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

Examples and Applications of Binary Search

Examples and Applications of Binary Search Toy Gog ITEE Uiersity of Queeslad I the secod lecture last week we studied the biary search algorithm that soles the problem of determiig if a particular alue appears i a sorted list of iteger or ot. We

More information

1. SWITCHING FUNDAMENTALS

1. SWITCHING FUNDAMENTALS . SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation Instruction Scheduling Last week Register allocation Background: Pipelining Basics Idea Begin executing an instruction before completing the previous one Today Instruction scheduling The problem: Pipelined

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition Lecture Goals Itroductio to Computig Systems: From Bits ad Gates to C ad Beyod 2 d Editio Yale N. Patt Sajay J. Patel Origial slides from Gregory Byrd, North Carolia State Uiversity Modified slides by

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

Algorithm Design Techniques. Divide and conquer Problem

Algorithm Design Techniques. Divide and conquer Problem Algorithm Desig Techiques Divide ad coquer Problem Divide ad Coquer Algorithms Divide ad Coquer algorithm desig works o the priciple of dividig the give problem ito smaller sub problems which are similar

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

Design of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018

Design of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018 Desig of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Executio Prof. Our Mutlu ETH Zurich Sprig 2018 27 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions? Basic Instruction Timings Pipelining 1 Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Design of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units

Design of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units Desig of Digital Circuits Lecture 21: SIMD Processors II ad Graphics Processig Uits Dr. Jua Gómez Lua Prof. Our Mutlu ETH Zurich Sprig 2018 17 May 2018 New Course: Bachelor s Semiar i Comp Arch Fall 2018

More information

Homework 1 Solutions MA 522 Fall 2017

Homework 1 Solutions MA 522 Fall 2017 Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear

More information

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Introduction to Pipelining Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L15-1 Performance Measures Two metrics of interest when designing a system: 1. Latency: The delay

More information

Isn t It Time You Got Faster, Quicker?

Isn t It Time You Got Faster, Quicker? Is t It Time You Got Faster, Quicker? AltiVec Techology At-a-Glace OVERVIEW Motorola s advaced AltiVec techology is desiged to eable host processors compatible with the PowerPC istructio-set architecture

More information

Lecture 10: Pipelined Implementations: Hazards and Resolutions. Instruction Pipeline Reality

Lecture 10: Pipelined Implementations: Hazards and Resolutions. Instruction Pipeline Reality 18-447 Lecture 10: Pipelined Implementations: Hazards and Resolutions S 09 L10-1 James C. Hoe José F. Martínez Electrical and Computer Engineering Carnegie Mellon University February 15, 2010 Instruction

More information

ΕΠΛ 605 Εργαστήριο 5. Παναγιώτα Νικολάου 11/10/18. Slides from: Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin

ΕΠΛ 605 Εργαστήριο 5. Παναγιώτα Νικολάου 11/10/18. Slides from: Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin ΕΠΛ 605 Εργαστήριο 5 Παναγιώτα Νικολάου 11/10/18 Slides from: Rajagopala Desika, Doug Burger, Stephe Keckler, Todd Austi Simulators Simulatio is the process of desigig a model of a real system ad coductig

More information

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 ) EE26: Digital Desig, Sprig 28 3/6/8 EE 26: Itroductio to Digital Desig Combiatioal Datapath Yao Zheg Departmet of Electrical Egieerig Uiversity of Hawaiʻi at Māoa Combiatioal Logic Blocks Multiplexer Ecoders/Decoders

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information