CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago
|
|
- Marilynn Black
- 5 years ago
- Views:
Transcription
1 CMSC Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago
2 Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award lecture Tomorrow 2
3 Lecture Outlie Pipeliig basics ad discussios No-ideal pipelie 3
4 Sigle Cycle uarch: Datapath & Cotrol **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 4
5 Sigle Cycle uarch: Summary Iefficiet All istructios ru as slow as the slowest istructio Not ecessarily the simplest way to implemet a ISA Sigle-cycle implemetatio of REP MOVS (x86)? Not easy to optimize/improve performace Optimizig the commo case (e.g. commo istructios) does ot work Need to optimize the worst case all the time All resources are ot fully utilized e.g., data memory access ca t overlap with ALU operatio How to do better? 5
6 Sigle-Cycle, Multi-Cycle, Pipeliig Sigle-cycle: 1 cycle per istructio, log cycle time F D E M W F D E M W Multi-cycle: 5 cycles per istructio, short cycle time F D E M W F D E M W F D E M W Pipelie: 1 cycle per istructio (steady state), short cycle time F D E M W F D E M W F D E M W F D E M W Time 6
7 Istructio Pipeliig: Basic Idea Pipelie the executio of multiple istructios Idea: Divide the istructio processig ito distict stages of processig Esure there are eough hardware resources to process oe istructio i each stage Process a differet istructio i each stage Istructios cosecutive i program order are processed i cosecutive stages Beefit: Icreases istructio processig throughput Dowside: Start thikig about this 7
8 Pipeliig Istructio Processig 8
9 Remember: Istructio Processig Steps 1. Istructio fetch (IF) 2. Istructio decode ad register operad fetch (ID/RF) 3. Execute/Evaluate memory address (EX/AG) 4. Memory operad fetch (MEM) 5. Store/writeback result (WB) 9
10 Remember the Sigle-Cycle Uarch Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 10
11 Pipelie Operatio Examples We ll look at load & store Show pipelie usage i a sigle cycle Highlight resources used 11
12 Review: LEGv8 Sigle-Cycle Datapath **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 12
13 Addig Pipelie Registers Registers betwee stages to hold iformatio produced i previous cycle Imm E B M AoutW BE IR D PC D A E PC E Aout M PC M MDR W **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 13
14 IF for Load, Store, Cycle 1 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 14
15 ID for Load, Store, Cycle 2 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 15
16 EX for Load Cycle 3 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 16
17 MEM for Load Cycle 4 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 17
18 WB for Load Cycle 5 Wrog register umber **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 18
19 Corrected Datapath for Load Cycle 5 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 19
20 EX for Store **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 20
21 MEM for Store **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 21
22 WB for Store **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 22
23 Pipelie Operatio Examples Cosider the followig istructio seueces LDUR X10, [X1, 40] SUB X11, X2, X3 ADD X12, X3, X4 LDUR X13, [X1, 48] ADD X14, X5, X6 23
24 Fillig up the Pipelie **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 24
25 Pipelie: Steady State State of pipelie at the 5th cycles **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 25
26 Illustratig Pipelie Operatio: Operatio View t 0 t 1 t 2 t 3 t 4 t 5 Ist 0 Ist 1 Ist 2 Ist 3 Ist 4 IF ID IF EX ID IF MEM EX ID IF WB MEM EX ID IF steady state (full pipelie) WB MEM EX ID IF WB MEM EX ID IF WB MEM EX ID IF 26
27 Illustratig Pipelie Operatio: Resource View t 0 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 IF I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 ID I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 EX I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 MEM I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 WB I 0 I 1 I 2 I 3 I 4 I 5 I 6 27
28 Pipelied Cotrol Idetical set of cotrol poits as the sigle-cycle uarch!! **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 28
29 Pipelied Cotrol Cotrol sigals derived from istructio Decode oce as i sigle-cycle implemetatio Buffer sigals util cosumed What other optios are there to derive pipelie cotrol sigals? **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 29
30 Pipelied Cotrol + Datapath Note: 1. Reg2Loc==0: istructio[20:16] is selected; ad Reg2Loc==1: istructio[4:0] is selected; 2. istructio[9:5] is the iput to Read register1 **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 30
31 Performace Aalysis 31
32 Termiologies ad Defiitios CPI: cycle per istructio IPC: istructio per cycle, which is 1/CPI Executio time of a istructio {CPI} x {clock cycle time} Executio time of a program Iro Law Sum over all istructios [ {CPI} x {clock cycle time} ] {# of istructios} x {average CPI} x {clock cycle time} 32
33 Examples Remember: executio time of a program Sum over all istructios [ {CPI} x {clock cycle time} ] {# of istructios} x {average CPI} x {clock cycle time} Sigle-cycle uarch CPI = 1, but clock cycle time is log Multi-cycle uarch (with 5 stages) CPI = 5, but clock cycle time is short Pipelied uarch (with 5 stages) CPI = 1 (steady state), clock cycle time same with multi-cycle This is the ideal case 33
34 Pipeliig: Discussios 34
35 Pipelied uarch Is this a good partitioig? Why ot 4 or 6 stages? Why ot differet boudaries? **Based o origial figure from [P&H CO&D, COPYRIGHT 2017 Elsevier. ALL RIGHTS RESERVED.] 35
36 Pipelie Cosideratios How to partitio? How may stages? 36
37 Pipelie Partitioig: Resource Reuiremet The goal: o shared resources amog differet pipelie stages i.e., No resource is used by more tha 1 stage Otherwise, we have resource cotetio or structural hazard Example: eed to be able to fetch istructios (i IF stage) ad load data (i MEM stage) at the same time Sigle memory iterface ot sufficiet Solutio 1: provide two separate iterfaces via istructio ad data caches Solutio 2:?? 37
38 How May Pipelie Stages? BW (badwidth), a.k.a. throughput (1/ cycle time) Ideally, seuetial elemets (pipelie registers) do ot impose additioal delays/cost combiatioal logic (F,D,E,M,W) T ps BW=~(1/T) T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T) T/3 ps (F,D) T/3 ps (E,M) T/3 ps (M,W) BW=~(3/T) 38
39 Pipelie Stages ad Impact o Performace Nopipelied versio with delay T BW = 1/(T+S) where S = seuetial elemet delay T ps k-stage pipelied versio BW k-stage = 1 / (T/k +S ) BW max = 1 / (1 gate delay + S ) Seuetial elemet delay reduces BW (switchig overhead betwee stages) T/k ps T/k ps 39
40 Pipelie Stages ad Impact o HW Cost Nopipelied versio with combiatioal cost G Cost = G+L where L = seuetial elemet cost G gates k-stage pipelied versio Cost k-stage ~= G + Lk Seuetial elemets icrease hardware cost G/k G/k It is critical to balace the tradeoffs i.e., how may stages ad what is doe i each stage 40
41 Ideal vs. No Ideal Pipelies 41
42 Properties of A Ideal Pipelie Goal: Icrease throughput with little icrease i cost (hardware cost, i case of istructio processig) Repetitio of idetical operatios The same operatio is repeated o a large umber of differet iputs (e.g., all laudry loads go through the same steps) Uiformly partitioable suboperatios Processig a be evely divided ito uiform-latecy suboperatios (that do ot share resources) Repetitio of idepedet operatios No depedecies betwee repeated operatios Ca you implemet a ideal pipelie for istructio processig? 42
43 Istructio Pipelie: Not Ideal Idetical operatios... NOT! Þ differet istructios à ot all eed the same stages Forcig differet istructios to go through the same pipe stages à exteral fragmetatio (some pipe stages idle for some istructios) Uiform suboperatios... NOT! Þ differet pipelie stages à ot the same latecy Need to force each stage to be cotrolled by the same clock à iteral fragmetatio (some pipe stages are too fast but all take the same clock cycle time) Idepedet operatios... NOT! Þ istructios are ot idepedet of each other Need to detect ad resolve iter-istructio depedecies to esure the pipelie provides correct results à pipelie stalls (pipelie is ot always movig) 43
44 Istructio Pipelie: Not Ideal Idetical operatios... NOT! Þ differet istructios à ot all eed the same stages Forcig differet istructios to go through the same pipe stages à exteral fragmetatio (some pipe stages idle for some istructios) Examples Add, Brach: o eed to go through the MEM stage Others? Performace impact? 44
45 Istructio Pipelie: Not Ideal Uiform suboperatios... NOT! Þ differet pipelie stages à ot the same latecy Need to force each stage to be cotrolled by the same clock à iteral fragmetatio (some pipe stages are too fast but all take the same clock cycle time) 45
46 No-Uiform Operatios: Laudry Aalogy Time Task order A B C D 6 PM AM Time 6 PM AM Task order A B C D Based o origial figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] the slowest step decides throughput or cycle time 46
47 No-Uiform Operatios: Example 200ps 100ps 200ps 200ps 100ps Imm E B M AoutW BE IR D PC D A E PC E Aout M PC M MDR W Based o origial figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 47
48 No-Uiform Operatios: Example Program executio order Time (i istructios) lw $1, 100($0) Istructio fetch Reg ALU Data access Reg lw $2, 200($0) 800ps 8 s Istructio fetch Reg ALU Data access Reg lw $3, 300($0) Program executio Time order (i istructios) lw $1, 100($0) Istructio fetch 800ps 8 s Reg ALU Data access Reg Istructio fetch 800ps 8 s... lw $2, 200($0) 200ps 2 s Istructio fetch Reg ALU Data access Reg lw $3, 300($0) 200ps 2 s Istructio fetch Reg ALU Data access Reg 200ps 200ps 200ps 200ps 200ps 2 s 2 s 2 s 2 s 2 s 48
49 Istructio Pipelie: Not Ideal Idepedet operatios... NOT! Þ istructios are ot idepedet of each other Need to detect ad resolve iter-istructio depedecies to esure the pipelie provides correct results à pipelie stalls (pipelie is ot always movig) 49
50 Depedecies ad Their Types Also called hazards Two types Data depedecy Cotrol depedecy 50
51 Data Depedecy Hadlig 51
52 Data Depedecy Types Flow depedecy r 3 r 1 op r 2 Read-after-Write (RAW) r 5 r 3 op r 4 Ati depedecy r 5 r 3 op r 4 Write-after-Read (WAR) r 3 r 6 op r 7 Output-depedecy r 3 r 1 op r 2 Write-after-Write (WAW) r 5 r 3 op r 4 r 3 r 6 op r 7 52
53 Data Depedecy Types Flow depedecies always eed to be obeyed because they costitute true depedece o a value Ati ad output depedecies exist due to limited umber of architectural registers They are depedece o a ame, ot a value Ati ad output depedeces are easier to hadle Write to the destiatio i oe stage ad i program order Flow depedeces are more iterestig 53
54 Ways of Hadlig Flow Depedecies Detect ad wait util value is available i register file Detect ad forward/bypass data to depedet istructio Detect ad elimiate the depedece at the software level No eed for the hardware to detect depedece Predict the eeded value(s), execute speculatively, ad verify Do somethig else (fie-graied multithreadig) No eed to detect 54
55 Flow Depedecy Example Cosider this seuece: SUB X2, X1,X3 AND X12,X2,X5 OR X13,X2,X6 ADD X14,X2,X2 STUR X15,[X2,#100]
56 Flow Depedecy Example Time SUB X2, X1, X3 IF ID EX MEM WB AND X12, X2, X5 IF ID EX MEM WB OR ADD X13, X2, X6 X14, X2, X2 IF ID EX MEM? IF ID EX STUR X15, [X2, #100] IF ID SUB writig to X2 ad ADD readig from it i the same cycle Assume iteral forwardig i register file i.e., ADD gets the ew X2 value produced from SUB 56
57 How to Detect Flow Depedecies i HW? R/I-Type LDUR STUR B IF ID read RF read RF read RF EX MEM WB write RF write RF Istructios I A ad I B (where I A comes before I B ) have RAW depedecy iff ad I B (R/I, LDUR, or STUR) reads a register writte by I A (R/I or LDUR) dist(i A, I B ) < dist(id, WB) = 3 57
58 Flow Depedecy Check Logic Helper fuctios Op1(I) ad Op2(I) returs the 1 st ad 2 d register operad field of I, respectively Use_Op1(I) returs true if I reuires the 1 st register operads ad the register is ot X31; similarly for Use_Op2(I) Flow depedecy occurs whe or or or (Op1(IR ID )==dest EX ) && use_op1(ir ID ) && RegWrite EX (Op1(IR ID )==dest MEM ) && use_op1(ir ID ) && RegWrite MEM (Op2(IR ID )==dest EX ) && use_op2(ir ID ) && RegWrite EX (Op2(IR ID )==dest MEM ) && use_op2(ir ID ) && RegWrite MEM 58
59 Resolvig Data Depedece Optio 1: Stall the pipelie (i.e., Isertig bubbles ) t 0 t 1 t 2 t 3 t 4 t 5 Ist h IF ID ALU MEM WB Ist i i IF ID ALU MEM WB Ist j j IF ID ALU ID MEM ALU ID WB MEM ALU WB MEM Ist k IF ID IF ALU ID IF MEM ALU ID WB MEM ALU Ist l IF ID IF ALU ID IF MEM ALU ID IF ID IF ALU ID i: r x _ j: bubble _ r IF ID x dist(i,j)=1 IF Stall = make the depedet istructio j: bubble _ r x dist(i,j)=2 IF j: _ r x dist(i,j)=3 wait util its source data value is available 1. stop all up-stream stages 2. drai all dow-stream stages 59
60 Resolvig Data Depedece Optio 1: Stall the pipelie (i.e., Isertig bubbles ) t 0 t 1 t 2 t 3 t 4 t 5 Ist h IF ID ALU MEM WB Ist i i i IF ID ALU MEM WB Ist Bubble j (op) j IF ID j ALU ID MEM ALU WB MEM WB Ist Bubble k (op) IF k IF ID j ALU ID MEM ALU WB MEM Ist lj j IF k IF ID j ID ALU ALU ID MEM Ist k k IF IF ID i: r x _ k ID ALU j: bubble _ r IF ID x dist(i,j)=1 IF j: _ r x dist(i,j)=2 IF 60
Design of Digital Circuits Lecture 14: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2018
Desig of Digital Circuits Lecture 4: Pipeliig Prof. Our Mutlu ETH Zurich Sprig 28 9 April 28 Ageda for Today & Next Few Lectures Previous lectures Sigle-cycle Microarchitectures Multi-cycle ad Microprogrammed
More informationCMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago
CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05,
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad
More informationCMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining Prof. Yanjing Li University of Chicago Administrative Stuff! Lab1 due at 11:59pm today! Lab2 out " Pipeline ARM simulator "
More informationCMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 3: ISA ad Itroductio to Microarchitecture Prof. Yajig Li Uiversity of Chicago Lecture Outlie ISA uarch (hardware implemetatio of a ISA) Logic desig basics Sigle-cycle
More informationCMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago
CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device
More informationCMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.
More informationCSC 220: Computer Organization Unit 11 Basic Computer Organization and Design
College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:
More informationThis Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines
This Uit: Damic Schedulig CSE 560 Computer Sstems Architecture Damic Schedulig Slides origiall developed b Drew Hilto (IBM) ad Milo Marti (Uiversit of Peslvaia) App App App Sstem software Mem CPU I/O Code
More informationCMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,
More informationAppendix D. Controller Implementation
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);
More informationMulti-Threading. Hyper-, Multi-, and Simultaneous Thread Execution
Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors
CS252 Sprig 2017 Graduate Computer Architecture Lecture 6: Out-of-Order Processors Lisa Wu, Krste Asaovic http://ist.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 2 WU UCB CS252 SP17 Last Time i Lecture
More informationArquitectura de Computadores
Arquitectura de Computadores Capítulo 2. Procesadores segmetados Based o the origial material of the book: D.A. Patterso y J.L. Heessy Computer Orgaizatio ad Desig: The Hardware/Software Iterface 4 th
More informationEnd Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization
Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed
More informationElementary Educational Computer
Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified
More informationChapter 4 The Datapath
The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that
More informationCMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 4 The Processor Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler CPI ad Cycle time Determied
More informationCMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Multi-Core Prof. Yajig Li Uiversity of Chicago Course Evaluatio Very importat Please fill out! 2 Lab3 Brach Predictio Competitio 8 teams etered the competitio,
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationMultiprocessors. HPC Prof. Robert van Engelen
Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies
More informationLecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5
Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:
More informationAnnouncements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components
Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Review Istructio Set Architecture Istructio Set The repertoire of istructios of a computer Differet computers have differet istructio
More informationData diverse software fault tolerance techniques
Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the
More informationHash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative
More informationLecture 3. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram
Lecture 3 RTL Desig Methodology Trasitio from Pseudocode & Iterface to a Correspodig Block Diagram Structure of a Typical Digital Data Iputs Datapath (Executio Uit) Data Outputs System Cotrol Sigals Status
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies
More informationSwitching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1
Switchig Hardware Sprig 208 CS 438 Staff, Uiversity of Illiois Where are we? Uderstad Differet ways to move through a etwork (forwardig) Read sigs at each switch (datagram) Follow a kow path (virtual circuit)
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationInstruction and Data Streams
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationBasic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.
5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator
More informationRecursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames
Uit 4, Part 3 Recursio Computer Sciece S-111 Harvard Uiversity David G. Sulliva, Ph.D. Review: Method Frames Whe you make a method call, the Java rutime sets aside a block of memory kow as the frame of
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationMaster Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts
More informationDESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO
DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO Sagwo Seo, Trevor Mudge Advaced Computer Architecture Laboratory Uiversity of Michiga at A Arbor {swseo, tm}@umich.edu Yumig Zhu, Chaitali
More informationUniprocessors. HPC Prof. Robert van Engelen
Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures
More informationLecture 19 Introduction to Pipelining
CSE 30321 Lecture 19 Pipelining (Part 1) 1 Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) Basic pipelining basic := single, in-order issue single issue one instruction at
More information. Written in factored form it is easy to see that the roots are 2, 2, i,
CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or
More informationComputer Graphics Hardware An Overview
Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)
More informationLecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein
068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig
More informationThreads and Concurrency in Java: Part 1
Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationThreads and Concurrency in Java: Part 1
Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationSPIRAL DSP Transform Compiler:
SPIRAL DSP Trasform Compiler: Applicatio Specific Hardware Sythesis Peter A. Milder (peter.milder@stoybroo.edu) Fraz Frachetti, James C. Hoe, ad Marus Pueschel Departmet of ECE Caregie Mello Uiversity
More informationCIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13
CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis
More informationDesign of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018
Desig of Digital Circuits Lecture 16: Out-of-Order Executio Prof. Our Mutlu ETH Zurich Sprig 2018 26 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures Multi-cycle ad Microprogrammed
More informationCMPT 125 Assignment 2 Solutions
CMPT 25 Assigmet 2 Solutios Questio (20 marks total) a) Let s cosider a iteger array of size 0. (0 marks, each part is 2 marks) it a[0]; I. How would you assig a poiter, called pa, to store the address
More informationEE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering
EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2005-9-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/ Office Hours
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationHeaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015
Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 201 Heaps 201 Goodrich ad Tamassia xkcd. http://xkcd.com/83/. Tree. Used with permissio uder
More informationLecture 1: Introduction and Fundamental Concepts 1
Uderstadig Performace Lecture : Fudametal Cocepts ad Performace Aalysis CENG 332 Algorithm Determies umber of operatios executed Programmig laguage, compiler, architecture Determie umber of machie istructios
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationThe Simeck Family of Lightweight Block Ciphers
The Simeck Family of Lightweight Block Ciphers Gagqiag Yag, Bo Zhu, Valeti Suder, Mark D. Aagaard, ad Guag Gog Electrical ad Computer Egieerig, Uiversity of Waterloo Sept 5, 205 Yag, Zhu, Suder, Aagaard,
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More information(Basic) Processor Pipeline
(Basic) Processor Pipeline Nima Honarmand Generic Instruction Life Cycle Logical steps in processing an instruction: Instruction Fetch (IF_STEP) Instruction Decode (ID_STEP) Operand Fetch (OF_STEP) Might
More informationComputer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017
Computer Architecture Lecture 8: SIMD Processors ad GPUs Prof. Our Mutlu ETH Zürich Fall 2017 18 October 2017 Ageda for Today & Next Few Lectures SIMD Processors GPUs Itroductio to GPU Programmig Digitaltechik
More informationChapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 10 Defiig Classes Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 10.1 Structures 10.2 Classes 10.3 Abstract Data Types 10.4 Itroductio to Iheritace Copyright 2015 Pearso Educatio,
More informationCourse Site: Copyright 2012, Elsevier Inc. All rights reserved.
Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationChapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea
5-1 Chapter 5 Processor Desig Advaced Topics Chapter 5: Processor Desig Advaced Topics Topics 5.3 Microprogrammig Cotrol store ad microbrachig Horizotal ad vertical microprogrammig 5- Chapter 5 Processor
More informationLecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram
Lecture 2 RTL Desig Methodology Trasitio from Pseudocode & Iterface to a Correspodig Block Diagram Structure of a Typical Digital Data Iputs Datapath (Executio Uit) Data Outputs System Cotrol Sigals Status
More informationAnalysis of Algorithms
Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite
More informationExamples and Applications of Binary Search
Toy Gog ITEE Uiersity of Queeslad I the secod lecture last week we studied the biary search algorithm that soles the problem of determiig if a particular alue appears i a sorted list of iteger or ot. We
More information1. SWITCHING FUNDAMENTALS
. SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig
More informationCOMP2611: Computer Organization. The Pipelined Processor
COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationBackground: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation
Instruction Scheduling Last week Register allocation Background: Pipelining Basics Idea Begin executing an instruction before completing the previous one Today Instruction scheduling The problem: Pipelined
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationCOMP Parallel Computing. PRAM (1): The PRAM model and complexity measures
COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems
More informationIntroduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition
Lecture Goals Itroductio to Computig Systems: From Bits ad Gates to C ad Beyod 2 d Editio Yale N. Patt Sajay J. Patel Origial slides from Gregory Byrd, North Carolia State Uiversity Modified slides by
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists
More informationAlgorithm Design Techniques. Divide and conquer Problem
Algorithm Desig Techiques Divide ad coquer Problem Divide ad Coquer Algorithms Divide ad Coquer algorithm desig works o the priciple of dividig the give problem ito smaller sub problems which are similar
More informationCOMPUTER ORGANIZATION AND DESIGN
ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some
More informationDesign of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018
Desig of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Executio Prof. Our Mutlu ETH Zurich Sprig 2018 27 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures
More informationHow do we evaluate algorithms?
F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local
More informationProcessor (II) - pipelining. Hwansoo Han
Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number
More informationBasic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?
Basic Instruction Timings Pipelining 1 Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction
More informationLecture 5. Counting Sort / Radix Sort
Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationDesign of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units
Desig of Digital Circuits Lecture 21: SIMD Processors II ad Graphics Processig Uits Dr. Jua Gómez Lua Prof. Our Mutlu ETH Zurich Sprig 2018 17 May 2018 New Course: Bachelor s Semiar i Comp Arch Fall 2018
More informationHomework 1 Solutions MA 522 Fall 2017
Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear
More informationIntroduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Introduction to Pipelining Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L15-1 Performance Measures Two metrics of interest when designing a system: 1. Latency: The delay
More informationIsn t It Time You Got Faster, Quicker?
Is t It Time You Got Faster, Quicker? AltiVec Techology At-a-Glace OVERVIEW Motorola s advaced AltiVec techology is desiged to eable host processors compatible with the PowerPC istructio-set architecture
More informationLecture 10: Pipelined Implementations: Hazards and Resolutions. Instruction Pipeline Reality
18-447 Lecture 10: Pipelined Implementations: Hazards and Resolutions S 09 L10-1 James C. Hoe José F. Martínez Electrical and Computer Engineering Carnegie Mellon University February 15, 2010 Instruction
More informationΕΠΛ 605 Εργαστήριο 5. Παναγιώτα Νικολάου 11/10/18. Slides from: Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin
ΕΠΛ 605 Εργαστήριο 5 Παναγιώτα Νικολάου 11/10/18 Slides from: Rajagopala Desika, Doug Burger, Stephe Keckler, Todd Austi Simulators Simulatio is the process of desigig a model of a real system ad coductig
More informationEE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )
EE26: Digital Desig, Sprig 28 3/6/8 EE 26: Itroductio to Digital Desig Combiatioal Datapath Yao Zheg Departmet of Electrical Egieerig Uiversity of Hawaiʻi at Māoa Combiatioal Logic Blocks Multiplexer Ecoders/Decoders
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationEE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control
EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,
More information