Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade

Size: px
Start display at page:

Download "Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade"

Transcription

1 EECS 252 Gaduate Compute Achitectue Lectue 2 ℵ 0 Review of Instuction Sets, Pipelines, and Caches Januay 26 th, 2009 Review Mooe s Law John Kubiatowicz Electical Engineeing and Compute Sciences Univesity of Califonia, Bekeley http// Camming Moe Components onto Integated Cicuits Godon Mooe, Electonics, 1965 # on tansistos on cost-effective integated cicuit double evey 18 months 1/26/2009 CS252-S09, Lectue 02 2 Pefomance (vs. VAX-11/780) Review Joy s Law in ManyCoe wold 1 Fom Hennessy and Patteson, Compute Achitectue A Quantitative Appoach, 4th edition, Octobe, %/yea 52%/yea??%/yea VAX 25%/yea 1978 to 1986 RISC + x86 52%/yea 1986 to 2002 RISC + x86??%/yea 2002 to pesent 1/26/2009 CS252-S09, Lectue 02 3 log (people pe compute) Bell s Law new class pe decade Enabled by technological oppotunities yea Smalle, moe numeous and moe intimately connected Bings in a new kind of application Numbe Cunching Data Stoage poductivity inteactive steaming infomation to/fom physical wold Used in many ways not peviously imagined 1/26/2009 CS252-S09, Lectue 02 4

2 Metics used to Compae Designs Today Quick eview of eveything you should have leaned ℵ 0 ( A countably-infinite set of compute achitectue concepts ) Cost Die cost and system cost Execution Time aveage and wost-case Latency vs. Thoughput Enegy and Powe Also peak powe and peak switching cuent Reliability Resiliency to electical noise, pat failue Robustness to bad softwae, opeato eo Maintainability System administation costs Compatibility Softwae costs dominate 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 6 Cost of Pocesso Design cost (Non-ecuing Engineeing Costs, NRE) dominated by enginee-yeas (~$200K pe enginee yea) also mask costs (exceeding $1M pe spin) Cost of die die aea die yield (matuity of manufactuing pocess, edundancy featues) cost/size of wafes die cost ~= f(die aea 4 ) with no edundancy Cost of packaging numbe of pins (signal + powe/gound pins) powe dissipation Cost of testing built-in test featues? logical complexity of design choice of cicuits (minimum clock ates, leakage cuents, I/O dives) Achitect affects all of these What is Pefomance? Latency (o esponse time o execution time) time to complete one task Bandwidth (o thoughput) tasks completed pe unit time 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 8

3 Definition Pefomance Pefomance is in units of things pe sec bigge is bette If we ae pimaily concened with esponse time pefomance(x) = 1 execution_time(x) " X is n times faste than Y" means Pefomance(X) n = = Pefomance(Y) Execution_time(Y) Execution_time(X) 1/26/2009 CS252-S09, Lectue 02 9 Pefomance What to measue Usually ely on benchmaks vs. eal wokloads To incease pedictability, collections of benchmak applications-- benchmak suites -- ae popula SPECCPU popula desktop benchmak suite CPU only, split between intege and floating point pogams SPECint2000 has 12 intege, SPECfp2000 has 14 intege pgms SPECCPU2006 to be announced Sping 2006 SPECSFS (NFS file seve) and SPECWeb (WebSeve) added as seve benchmaks Tansaction Pocessing Council measues seve pefomance and cost-pefomance fo databases TPC-C Complex quey fo Online Tansaction Pocessing TPC-H models ad hoc decision suppot TPC-W a tansactional web benchmak TPC-App application seve and web sevices benchmak 1/26/2009 CS252-S09, Lectue Summaizing Pefomance depends who s selling System Rate (Task 1) Rate (Task 2) A B Which system is faste? System Rate (Task 1) Rate (Task 2) A B Aveage thoughput System Rate (Task 1) Rate (Task 2) A B Thoughput elative to B Aveage Aveage System Rate (Task 1) Rate (Task 2) A B Thoughput elative to A Aveage /26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 12

4 Summaizing Pefomance ove Set of Benchmak Pogams Nomalized Execution Time and Geometic Mean Measue speedup up elative to efeence machine Aithmetic mean of execution times t i (in seconds) 1/n Σ i t i Hamonic mean of execution ates i (MIPS/MFLOPS) n/ [Σ i (1/ i )] Both equivalent to wokload whee each pogam is un the same numbe of times Can add weighting factos to model othe wokload distibutions atio = t Ref /t A Aveage time atios using geometic mean n ( I atio i ) Insensitive to machine chosen as efeence Insensitive to un time of individual benchmaks Used by SPEC89, SPEC92, SPEC95,, SPEC But bewae that choice of efeence machine can suggest what is nomal pefomance pofile 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue Vecto/Supescala Speedup Supescala/Vecto Speedup 100 MHz Cay J90 vecto machine vesus 300MHz Alpha [LANL Computational Physics Codes, Wasseman, ICS 96] Vecto machine peaks on a few codes???? 1/26/2009 CS252-S09, Lectue MHz Cay J90 vecto machine vesus 300MHz Alpha [LANL Computational Physics Codes, Wasseman, ICS 96] Scala machine peaks on one code??? 1/26/2009 CS252-S09, Lectue 02 16

5 How to Mislead with Pefomance Repots Select pieces of wokload that wok well on you design, ignoe othes Use unealistic data set sizes fo application (too big o too small) Repot thoughput numbes fo a latency benchmak Repot latency numbes fo a thoughput benchmak Repot pefomance on a kenel and claim it epesents an entie application Use 16-bit fixed-point aithmetic (because it s fastest on you system) even though application equies 64-bit floating-point aithmetic Use a less efficient algoithm on the competing machine Repot speedup fo an inefficient algoithm (bubblesot) Compae hand-optimized assembly code with unoptimized C code Compae you design using next yea s technology against competito s yea old design (1% pefomance impovement pe week) Ignoe the elative cost of the systems being compaed Repot aveages and not individual esults Repot speedup ove unspecified base system, not absolute times Repot efficiency not absolute times Repot MFLOPS not absolute times (use inefficient algoithm) [ David Bailey Twelve ways to fool the masses when giving pefomance esults fo paallel supecomputes ] 1/26/2009 CS252-S09, Lectue Amdahl s Law ExTimenew = ExTimeold 1 Speedup oveall ExTime = ExTime old new = ( 1 Faction ) Best you could eve hope to do Speedup = Factionenhanced ( Factionenhanced ) + maximum Faction enhanced ( ) 1 Faction + Speedup enhanced Speedup enhanced enhanced enhanced 1/26/2009 CS252-S09, Lectue Amdahl s Law example New CPU 10X faste I/O bound seve, so 60% time waiting fo I/O Speedup oveall = = ( 1 Faction ) 1 ( 1 0.4) enhanced 1 Faction + Speedup = 1.56 enhanced enhanced Appaently, its human natue to be attacted by 10X faste, vs. keeping in pespective its just 1.6X faste = Compute Pefomance inst count Cycle time CPU CPU time time = Seconds = Instuctions x Cycles Cycles x Seconds Pogam Pogam Instuction Cycle Cycle CPI Inst Count CPI Clock Rate Pogam X Compile X (X) Inst. Set. X X Oganization X X Technology X 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 20

6 Cycles Pe Instuction (Thoughput) Aveage Cycles pe Instuction CPI = (CPU Time * Clock Rate) / Instuction Count = Cycles / Instuction Count CPU time = Cycle Time CPI j I CPI = n n j=1 CPI j Fj whee Fj = j=1 Instuction Fequency 1/26/2009 CS252-S09, Lectue j I Instuction Count j Example Calculating CPI bottom up Run benchmak and collect wokload chaacteization (simulate, machine countes, o sampling) Base Machine ( / ) Op Feq Cycles CPI(i) (% Time) 50% 1.5 (33%) Load 20% 2.4 (27%) Stoe 10% 2.2 (13%) Banch 20% 2.4 (27%) 1.5 Typical Mix of instuction types in pogam Design guideline Make the common case fast MIPS 1% ule only conside adding an instuction of it is shown to add 1% pefomance impovement on easonable benchmaks. 1/26/2009 CS252-S09, Lectue Powe and Enegy Peak Powe vesus Lowe Enegy Enegy to complete opeation (Joules) Coesponds appoximately to battey life (Battey enegy capacity actually depends on ate of dischage) Peak powe dissipation (Watts = Joules/second) Affects packaging (powe and gound pins, themal design) di/dt, peak change in supply cuent (Amps/second) Affects powe supply noise (powe and gound pins, decoupling capacitos) Powe Time Peak A Peak B Integate powe cuve to get enegy System A has highe peak powe, but lowe total enegy System B has lowe peak powe, but highe total enegy 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 24

7 CS 252 Administivia Sign up! Web site is (doesn t quite wok!) http// Review Chapte 1, Appendix A, B, C CS 152 home page, maybe Compute Oganization and Design (COD)2/e If did take a class, be sue COD Chaptes 2, 5, 6, 7 ae familia Copies in Bechtel Libay on 2-hou eseve Fist two eadings ae up (look on Lectue page) Read the assignment caefully, since the equiements vay about what you need to tun in Submit esults to website befoe class» (will be a link up on handouts page) You can have 5 total late days on assignments» 10% pe day aftewads» Save late days! CS 252 Administivia Resouces fo couse on web site Check out the ISCA (Intenational Symposium on Compute Achitectue) 25th yea etospective on web site. Look fo Additional eading below text-book desciption Pointes to pevious CS152 exams and esouces Lots of old CS252 mateial Inteesting links. Check out the WWW Compute Achitectue Home Page Size of class seems ok I asked Michael David to put eveyone on waitlist into class Check to make sue 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue A "Typical" RISC ISA ISA Implementation Review 32-bit fixed fomat instuction (3 fomats) bit GPR (R0 contains zeo, DP take pai) 3-addess, eg-eg aithmetic instuction Single addess mode fo load/stoe base + displacement no indiection Simple banch conditions Delayed banch see SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowePC, CDC 6600, CDC 7600, Cay-1, Cay-2, Cay-3 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 28

8 Example MIPS (- MIPS) iste-iste Op Rs1 Rs2 Rd Opx iste-immediate Op Rs1 Rd immediate Banch Op Rs1 Rs2/Opx immediate Datapath vs Contol Datapath signals Contol Points Contolle Jump / Call Op taget 1/26/2009 CS252-S09, Lectue Datapath Stoage, FU, inteconnect sufficient to pefom the desied functions Inputs ae Contol Points Outputs ae signals Contolle State machine to ochestate opeation on the data path Based on desied function and signals 1/26/2009 CS252-S09, Lectue Steps of MIPS Datapath Instuction Fetch Inst. Decode. Fetch Execute Add. Calc Memoy Access Wite Back Simple Pipelining Review Next PC 4 Adde Next SEQ PC RS1 Next SEQ PC Zeo? MUX 1/26/2009 CS252-S09, Lectue Addess Memoy IR <= mem[pc]; PC <= PC + 4 A <= [IR s ]; B <= [IR t ] slt <= A op IRop WB <= slt B IF/ID RS2 Imm File Sign Extend ID/EX [IR d ] <= WB local decode fo each instuction phase 1/26/2009 / pipeline CS252-S09, stage Lectue MUX MUX EX/MEM RD RD RD Data stationay contol Data Memoy MEM/WB MUX WB Data

9 Visualizing Pipelining Figue A.2, Page A-8 Pipelining is not quite that easy! Time (clock cycles) I n s t. O d e Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Limits to pipelining Hazads pevent next instuction fom executing duing its designated clock cycle Stuctual hazads HW cannot suppot this combination of instuctions (single peson to fold and put clothes away) Data hazads Instuction depends on esult of pio instuction still in the pipeline (missing sock) Contol hazads Caused by delay between the fetching of instuctions and decisions about changes in contol flow (banches and jumps). 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue One Memoy Pot/Stuctual Hazads Figue A.4, Page A-14 Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 One Memoy Pot/Stuctual Hazads (Simila to Figue A.5, Page A-15) Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 I n s t. O d e Load Inst 1 Inst 2 Stall Inst 3 Bubble Bubble Bubble Bubble Bubble 1/26/2009 CS252-S09, Lectue How do you bubble the pipe? 1/26/2009 CS252-S09, Lectue 02 36

10 Speed Up Equation fo Pipelining CPI pipelined = Ideal CPI + Aveage Stall cycles pe Inst Ideal CPI Pipeline depth Cycle Time Speedup = Ideal CPI + Pipeline stall CPI Cycle Time Fo simple RISC pipeline, CPI = 1 Pipeline depth Cycle Time Speedup = 1 + Pipeline stall CPI Cycle Time unpipelined pipelined unpipelined pipelined Example Dual-pot vs. Single-pot Machine A Dual poted memoy ( Havad Achitectue ) Machine B Single poted memoy, but its pipelined implementation has a 1.05 times faste clock ate Ideal CPI = 1 fo both Loads ae 40% of instuctions executed SpeedUp A = Pipeline Depth/(1 + 0) x (clock unpipe /clock pipe ) = Pipeline Depth SpeedUp B = Pipeline Depth/( x 1) x (clock unpipe /(clock unpipe / 1.05) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUp A / SpeedUp B = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33 Machine A is 1.33 times faste 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue Data Hazad on R1 Thee Geneic Data Hazads Time (clock cycles) IF ID/RF EX MEM WB Read Afte Wite (RAW) Inst J ties to ead opeand befoe Inst I wites it I n s t. O d e add 1,2,3 sub 4,1,3 and 6,1,7 o 8,1,9 xo 10,1,11 I add 1,2,3 J sub 4,1,3 Caused by a Dependence (in compile nomenclatue). This hazad esults fom an actual need fo communication. 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 40

11 Thee Geneic Data Hazads Wite Afte Read (WAR) Inst J wites opeand befoe Inst I eads it I sub 4,1,3 J add 1,2,3 K mul 6,1,7 Called an anti-dependence by compile wites. This esults fom euse of the name 1. Can t happen in MIPS 5 stage pipeline because All instuctions take 5 stages, and Reads ae always in stage 2, and Wites ae always in stage 5 Thee Geneic Data Hazads Wite Afte Wite (WAW) Inst J wites opeand befoe Inst I wites it. I sub 1,4,3 J add 1,2,3 K mul 6,1,7 Called an output dependence by compile wites This also esults fom the euse of name 1. Can t happen in MIPS 5 stage pipeline because All instuctions take 5 stages, and Wites ae always in stage 5 Will see WAR and WAW in moe complicated pipes 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue Fowading to Avoid Data Hazad HW Change fo Fowading Time (clock cycles) I n s t. O d e add 1,2,3 sub 4,1,3 and 6,1,7 o 8,1,9 NextPC istes Immediate ID/EX mux mux EX/MEM Data Memoy MEM/WR mux xo 10,1,11 What cicuit detects and esolves this hazad? 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 44

12 Fowading to Avoid LW-SW Data Hazad Data Hazad Even with Fowading Time (clock cycles) Time (clock cycles) I n s t. O d e add 1,2,3 lw 4, 0(1) sw 4,12(1) o 8,6,9 xo 10,9,11 I n s t. O d e lw 1, 0(2) sub 4,1,6 and 6,1,7 o 8,1,9 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue Data Hazad Even with Fowading Softwae Scheduling to Avoid Load Hazads I n s t. Time (clock cycles) lw 1, 0(2) sub 4,1,6 Bubble Ty poducing fast code fo a = b + c; d = e f; assuming a, b, c, d,e, and f in memoy. Slow code LW Rb,b Fast code LW Rb,b O d e and 6,1,7 o 8,1,9 Bubble Bubble LW ADD SW LW LW SUB Rc,c Ra,Rb,Rc a,ra Re,e Rf,f Rd,Re,Rf LW LW ADD LW SW SUB Rc,c Re,e Ra,Rb,Rc Rf,f a,ra Rd,Re,Rf SW d,rd SW d,rd 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 48

13 Contol Hazad on Banches Thee Stage Stall Banch Stall Impact 10 beq 1,3,36 14 and 2,3,5 18 o 6,1,7 22 add 8,1,9 36 xo 10,1,11 If CPI = 1, 30% banch, Stall 3 cycles => new CPI = 1.9! Two pat solution Detemine banch taken o not soone, AND Compute taken banch addess ealie MIPS banch tests if egiste = 0 o 0 MIPS Solution Move Zeo test to ID/RF stage Adde to calculate new PC in ID/RF stage 1 clock cycle penalty fo banch vesus 3 What do you do with the 3 instuctions in between? How do you do it? Whee is the commit? 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue Pipelined MIPS Datapath Figue A.24, page A-38 Fou Banch Hazad Altenatives Next PC Addess Instuction Fetch 4 Adde Memoy IF/ID Inst. Decode. Fetch Next SEQ PC Adde RS1 RS2 Imm MUX Zeo? File Sign Extend ID/EX Execute Add. Calc MUX EX/MEM RD RD RD Memoy Access Data Memoy MEM/WB Wite Back MUX WB Data #1 Stall until banch diection is clea #2 Pedict Banch Not Taken Execute successo instuctions in sequence Squash instuctions in pipeline if banch actually taken Advantage of late pipeline state update 47% MIPS banches not taken on aveage PC+4 aleady calculated, so use it to get next instuction #3 Pedict Banch Taken 53% MIPS banches taken on aveage But haven t calculated banch taget addess in MIPS» MIPS still incus 1 cycle banch penalty» Othe machines banch taget known befoe outcome Inteplay of instuction set design and cycle time. 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 52

14 Fou Banch Hazad Altenatives #4 Delayed Banch Define banch to take place AFTER a following instuction banch instuction sequential successo 1 sequential successo 2... sequential successo n banch taget if taken Banch delay of length n 1 slot delay allows pope decision and banch taget addess in 5 stage pipeline MIPS uses this 1/26/2009 CS252-S09, Lectue Scheduling Banch Delay Slots A. Fom befoe banch B. Fom banch taget C. Fom fall though add $1,$2,$3 if $2=0 then delay slot sub $4,$5,$6 add $1,$2,$3 if $1=0 then delay slot add $1,$2,$3 if $1=0 then delay slot sub $4,$5,$6 becomes becomes becomes if $2=0 then add $1,$2,$3 add $1,$2,$3 if $1=0 then sub $4,$5,$6 add $1,$2,$3 if $1=0 then sub $4,$5,$6 A is the best choice, fills delay slot & educes instuction count (IC) In B, the sub instuction may need to be copied, inceasing IC In B and C, must be okay to execute sub when banch fails 1/26/2009 CS252-S09, Lectue Delayed Banch Evaluating Banch Altenatives Compile effectiveness fo single banch delay slot Fills about 60% of banch delay slots About 80% of instuctions executed in banch delay slots useful in computation About 50% (60% x 80%) of slots usefully filled Delayed Banch downside As pocesso go to deepe pipelines and multiple issue, the banch delay gows and need moe than one delay slot Delayed banching has lost populaity compaed to moe expensive but moe flexible dynamic appoaches Gowth in available tansistos has made dynamic appoaches elatively cheape Pipeline speedup = Pipeline depth 1 +Banch fequency Banch penalty Assume 4% unconditional banch, 6% conditional banchuntaken, 10% conditional banch-taken Scheduling Banch CPI speedup v. speedup v. scheme penalty unpipelined stall Stall pipeline Pedict taken Pedict not taken Delayed banch /26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 56

15 Poblems with Pipelining Exception An unusual event happens to an instuction duing its execution Examples divide by zeo, undefined opcode Inteupt Hadwae signal to switch the pocesso to a new instuction steam Example a sound cad inteupts when it needs moe audio output samples (an audio click happens if it is left waiting) Poblem It must appea that the exception o inteupt must appea between 2 instuctions (I i and I i+1 ) The effect of all instuctions up to and including I i is totalling complete No effect of any instuction afte I i can take place The inteupt (exception) handle eithe abots pogam o estats at instuction I i+1 1/26/2009 CS252-S09, Lectue Pecise Exceptions in Static Pipelines Key obsevation achitected state only change in memoy and egiste wite stages. 1/26/2009 CS252-S09, Lectue Memoy Hieachy Review 1/26/2009 CS252-S09, Lectue Since 1980, CPU has outpaced DRAM... Pefomance (1/latency) Yea CPU 60% pe y 2X in 1.5 ys 1/26/2009 CS252-S09, Lectue CPU Gap gew 50% pe yea DRAM 9% pe y DRAM 2X in 10 ys 2000 How do achitects addess this gap? Put small, fast cache memoies between CPU and DRAM. Ceate a memoy hieachy

16 1977 DRAM faste than micopocessos Apple ][ (1977) CPU 1000 ns DRAM 400 ns Memoy Hieachy of a Moden Compute Take advantage of the pinciple of locality to Pesent as much memoy as in the cheapest technology Povide access at speed offeed by the fastest technology Pocesso Steve Jobs Steve Wozniak 1/26/2009 CS252-S09, Lectue Datapath Contol istes On-Chip Cache Second Level Cache (SRAM) Main Memoy (DRAM) Speed (ns) 1s 10s-100s 100s Size (bytes) 100s Ks-Ms Ms Seconday Stoage (Disk) 10,000,000s (10s ms) 1/26/2009 CS252-S09, Lectue Gs Tetiay Stoage (Tape) 10,000,000,000s (10s sec) Ts The Pinciple of Locality The Pinciple of Locality Pogam access a elatively small potion of the addess space at any instant of time. Two Diffeent Types of Locality Tempoal Locality (Locality in Time) If an item is efeenced, it will tend to be efeenced again soon (e.g., loops, euse) Spatial Locality (Locality in Space) If an item is efeenced, items whose addesses ae close by tend to be efeenced soon (e.g., staightline code, aay access) Last 15 yeas, HW elied on locality fo speed 1/26/2009 CS252-S09, Lectue Memoy Addess (one dot pe access) Pogams with locality cache well... Bad locality behavio Spatial Locality Tempoal Locality Time Donald J. Hatfield, Jeanette Geald Pogam Restuctuing fo Vitual Memoy. IBM Systems Jounal 1/26/2009 CS252-S09, 10(3) Lectue (1971) 64

17 Memoy Hieachy Apple imac G5 Managed by compile Managed by hadwae Managed by OS, hadwae, application 07 L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency 1, 3, 3, 11, 88, 10 7, Cycles, Time 0.6 ns 1.9 ns 1.9 ns 6.9 ns 55 ns 12 ms Goal Illusion of lage, fast, cheap memoy Let pogams addess a memoy space that scales to the disk size, at a speed that is usually as fast as egiste access imac G5 1.6 GHz 1/26/2009 CS252-S09, Lectue istes (1K) imac s PowePC 970 All caches on-chip L1 (64K Instuction) L1 (32K Data) 512K L2 1/26/2009 CS252-S09, Lectue Memoy Hieachy Teminology Hit data appeas in some block in the uppe level (example Block X) Hit Rate the faction of memoy access found in the uppe level Hit Time Time to access the uppe level which consists of RAM access time + Time to detemine hit/miss Miss data needs to be etieve fom a block in the lowe level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty Time to eplace a block in the uppe level + Time to delive the block the pocesso Hit Time << Miss Penalty (500 instuctions on 21264!) To Pocesso Fom Pocesso Uppe Level Memoy Blk X Lowe Level Memoy 1/26/2009 CS252-S09, Lectue Blk Y 4 Questions fo Memoy Hieachy Q1 Whee can a block be placed in the uppe level? (Block placement) Q2 How is a block found if it is in the uppe level? (Block identification) Q3 Which block should be eplaced on a miss? (Block eplacement) Q4 What happens on a wite? (Wite stategy) 1/26/2009 CS252-S09, Lectue 02 68

18 Q1 Whee can a block be placed in the uppe level? Block 12 placed in 8 block cache Fully associative, diect mapped, 2-way set associative S.A. Mapping = Block Numbe Modulo Numbe Sets Cache Memoy Full Mapped Diect Mapped (12 mod 8) = 4 2-Way Assoc (12 mod 4) = /26/2009 CS252-S09, Lectue A Summay on Souces of Cache Misses Compulsoy (cold stat o pocess migation, fist efeence) fist access to a block Cold fact of life not a whole lot you can do about it Note If you ae going to un billions of instuction, Compulsoy Misses ae insignificant Capacity Cache cannot contain all blocks access by the pogam Solution incease cache size Conflict (collision) Multiple memoy locations mapped to the same cache location Solution 1 incease cache size Solution 2 incease associativity Coheence (Invalidation) othe pocess (e.g., I/O) updates memoy 1/26/2009 CS252-S09, Lectue Q2 How is a block found if it is in the uppe level? Tag Block Addess Block offset Data Select Index Used to Lookup Candidates in Cache Index identifies the set Tag used to identify actual copy If no candidates match, then declae cache miss Block is minimum quantum of caching Data select field used to select data within block Many caching applications don t have data select field 1/26/2009 CS252-S09, Lectue Index Set Select Diect Mapped Cache Diect Mapped 2 N byte cache The uppemost (32 - N) bits ae always the Cache Tag The lowest M bits ae the Byte Select (Block Size = 2 M ) Example 1 KB Diect Mapped Cache with 32 B Blocks Index chooses potential block Tag checked to veify block Byte select chooses byte within block 31 9 Cache Tag Cache Index Ex 0x50 Ex 0x01 Valid Bit Cache Tag 0x50 Cache Data Byte 31 Byte 63 Byte 1 Byte 33 Byte 0 Byte Byte /26/2009 CS252-S09, Lectue Byte Select Ex 0x00 Byte

19 Valid Set Associative Cache N-way set associative N enties pe Cache Index N diect mapped caches opeates in paallel Example Two-way set associative cache Cache Index selects a set fom the cache Two tags in the set ae compaed to input in paallel Data is selected based on the tag esult 31 8 Cache Tag Cache Index Cache Tag Cache Data Cache Block 0 Compae 1 Sel1 Mux 0 Sel0 Cache Data Cache Block 0 4 Byte Select Cache Tag Compae 0 Valid Fully Associative Cache Fully Associative Evey block can hold any line Addess does not include a cache index Compae Cache Tags of all Cache Enties in Paallel Example Block Size=32B blocks We need N 27-bit compaatos Still have byte select to choose fom within block Cache Tag (27 bits long) Valid Bit Cache Data Byte 31 Byte 1 Byte 0 Byte 63 Byte 33 Byte 32 OR = 1/26/2009 Hit CS252-S09, Lectue Cache 02 Block 73 1/26/2009 CS252-S09, Lectue = = = = Cache Tag 4 Byte Select Ex 0x01 0 Q3 Which block should be eplaced on a miss? Q4 What happens on a wite? Easy fo Diect Mapped Set Associative o Fully Associative LRU (Least Recently Used) Appealing, but had to implement fo high associativity Random Easy, but how well does it wok? Policy Wite-Though Data witten to cache block also witten to lowelevel memoy Wite-Back Wite data only to the cache Update lowe level when a block falls out of the cache Assoc 2-way 4-way 8-way Size LRU Ran LRU Ran LRU Ran 16K 5.2% 5.7% 4.7% 5.3% 4.4% 5.0% 64K 1.9% 2.0% 1.5% 1.7% 1.4% 1.5% 256K 1.15% 1.17% 1.13% 1.13% 1.12% 1.12% 1/26/2009 CS252-S09, Lectue Debug Easy Had Do ead misses poduce wites? No Yes Do epeated wites make it to lowe level? Yes No Additional option -- let wites to an un-cached addess allocate a new cache line ( wite-allocate ). 1/26/2009 CS252-S09, Lectue 02 76

20 Wite Buffes fo Wite-Though Caches Pocesso Q. Why a wite buffe? Cache Wite Buffe Lowe Level Memoy Holds data awaiting wite-though to lowe level memoy Q. Why a buffe, why not just one egiste? Q. Ae Read Afte Wite (RAW) hazads an issue fo wite buffe? A. So CPU doesn t stall A. Busts of wites ae common. A. Yes! Dain buffe befoe next ead, o check wite buffes fo match on eads 1/26/2009 CS252-S09, Lectue Basic Cache Optimizations Reducing Miss Rate 1. Lage Block size (compulsoy misses) 2. Lage Cache size (capacity misses) 3. Highe Associativity (conflict misses) Reducing Miss Penalty 4. Multilevel Caches Reducing hit time 5. Giving Reads Pioity ove Wites E.g., Read complete befoe ealie wites in wite buffe 1/26/2009 CS252-S09, Lectue Vitual Memoy 1/26/2009 CS252-S09, Lectue Vitual Addess Space What is vitual memoy? Physical Addess Space Vitual Addess V page no. Page Table Base index into page table Page Table V Access Rights 10 offset Vitual memoy => teat memoy as a cache fo the disk Teminology blocks in this cache ae called Pages Typical size of a page 1K 8K Page table maps vitual page numbes to physical fames PTE = Page Table Enty 1/26/2009 CS252-S09, Lectue PA table located in physical memoy P page no. offset 10 Physical Addess

21 Thee Advantages of Vitual Memoy Tanslation Pogam can be given consistent view of memoy, even though physical memoy is scambled Makes multitheading easonable (now used a lot!) Only the most impotant pat of pogam ( Woking Set ) must be in physical memoy. Contiguous stuctues (like stacks) use only as much physical memoy as necessay yet still gow late. Potection Diffeent theads (o pocesses) potected fom each othe. Diffeent pages can be given special behavio» (Read Only, Invisible to use pogams, etc). Kenel data potected fom Use pogams Vey impotant fo potection fom malicious pogams Shaing Can map same physical page to multiple uses ( Shaed memoy ) Lage Addess Space Suppot Vitual Addess PageTablePt 10 bits 10 bits 12 bits Vitual Vitual P1 index P2 index Offset 4 bytes Single-Level Page Table Lage 4KB pages fo a 32-bit addess 1M enties Each pocess needs own page table! Multi-Level Page Table Can allow spaseness of page table Potions of table can be swapped to disk Physical Addess Physical Page # Offset 4KB 1/26/2009 CS252-S09, Lectue bytes 1/26/2009 CS252-S09, Lectue VM and Disk Page eplacement policy Head pointe Place pages on fee list if used bit is still clea. Schedule pages with dity bit set to be witten to disk. Set of all pages in Memoy Dity bit page witten. Used bit set to 1 on any efeence Tail pointe Clea the used bit in the page table Achitect s ole suppot setting dity and used bits Page Table dity used Feelist Fee Pages 1/26/2009 CS252-S09, Lectue Tanslation Look-Aside Buffes Tanslation Look-Aside Buffes (TLB) Cache on tanslations Fully Associative, Set Associative, o Diect Mapped Tanslation with a TLB hit VA PA miss CPU TLB Cache miss Tanslation TLBs ae Small typically not moe than enties Fully Associative Main Memoy 1/26/2009 CS252-S09, Lectue hit data

22 What Actually Happens on a TLB Miss? Hadwae tavesed page tables On TLB miss, hadwae in MMU looks at cuent page table to fill TLB (may walk multiple levels)» If PTE valid, hadwae fills TLB and pocesso neve knows» If PTE maked as invalid, causes Page Fault, afte which kenel decides what to do aftewads Softwae tavesed Page tables (like MIPS) On TLB miss, pocesso eceives TLB fault Kenel taveses page table to find PTE» If PTE valid, fills TLB and etuns fom fault» If PTE maked as invalid, intenally calls Page Fault handle Most chip sets povide hadwae tavesal Moden opeating systems tend to have moe TLB faults since they use tanslation fo many things Examples» shaed segments» use-level potions of an opeating system 1/26/2009 CS252-S09, Lectue Example R3000 pipeline MIPS R3000 Pipeline Inst Fetch Dcd/ / E.A Memoy Wite TLB I-Cache RF Opeation WB Vitual Addess Space ASID V. Page Numbe Offset E.A. TLB 0xx Use segment (caching based on PT/TLB enty) 100 Kenel physical space, cached 101 Kenel physical space, uncached 11x Kenel vitual space Allows context switching among 64 use pocesses without TLB flush D-Cache TLB 64 enty, on-chip, fully associative, softwae TLB fault handle 1/26/2009 CS252-S09, Lectue Reducing tanslation time futhe As descibed, TLB lookup is in seial with cache lookup Vitual Addess V page no. 10 offset Ovelapping TLB & Cache Access Hee is how this might wok with a 4K cache 32 TLB assoc lookup index 4K Cache 1 K TLB Lookup V Access Rights PA P page no. offset 10 Physical Addess Machines with TLBs go one step futhe they ovelap TLB lookup with cache access. Woks because offset available ealy 1/26/2009 CS252-S09, Lectue Hit/ Miss FN 20 page # 10 2 disp 00 What if cache size is inceased to 8KB? Ovelap not complete Need to do something else. See CS152/252 Anothe option Vitual Caches Tags in cache ae vitual addesses 4 bytes FN Data Hit/ Miss Tanslation only happens on cache misses 1/26/2009 CS252-S09, Lectue =

23 Poblems With Ovelapped TLB Access Ovelapped access equies addess bits used to index into cache do not change as esult tanslation This usually limits things to small caches, lage page sizes, o high n-way set associative caches if you want a lage cache Example suppose eveything the same except that the cache is inceased to 8 K bytes instead of 4 K 11 2 cache index vit page # disp Solutions go to 8K byte page sizes; go to 2 way set associative cache; o SW guaantee VA[13]=PA[13] This bit is changed by VA tanslation, but is needed fo cache lookup 1K 2 way set assoc cache /26/2009 CS252-S09, Lectue Summay Contol and Pipelining Next time Read Appendix A Contol VIA State Machines and Micopogamming Just ovelap tasks; easy if tasks ae independent Speed Up Pipeline Depth; if ideal CPI is 1, then Pipeline depth Cycle Time Speedup = 1 + Pipeline stall CPI Cycle Time Hazads limit pefomance on computes Stuctual need moe HW esouces Data (RAW,WAR,WAW) need fowading, compile scheduling Contol delayed banch, pediction Exceptions, Inteupts add complexity Next time Read Appendix C, ecod bugs online! unpipelined pipelined 1/26/2009 CS252-S09, Lectue Summay #1/3 The Cache Design Space Seveal inteacting dimensions cache size block size associativity eplacement policy wite-though vs wite-back wite allocation The optimal choice is a compomise depends on access chaacteistics» wokload Bad» use (I-cache, D-cache, TLB) depends on technology / cost Good Simplicity often wins Cache Size Facto A Less Associativity Block Size Facto B Moe Summay #2/3 Caches The Pinciple of Locality Pogam access a elatively small potion of the addess space at any instant of time.» Tempoal Locality Locality in Time» Spatial Locality Locality in Space Thee Majo Categoies of Cache Misses Compulsoy Misses sad facts of life. Example cold stat misses. Capacity Misses incease cache size Conflict Misses incease cache size and/o associativity. Nightmae Scenaio ping pong effect! Wite Policy Wite Though vs. Wite Back Today CPU time is a function of (ops, cache misses) vs. just f(ops) affects Compiles, Data stuctues, and Algoithms 1/26/2009 CS252-S09, Lectue /26/2009 CS252-S09, Lectue 02 92

24 Summay #3/3 TLB, Vitual Memoy Page tables map vitual addess to physical addess TLBs ae impotant fo fast tanslation TLB misses ae significant in pocesso pefomance funny times, as most systems can t access all of 2nd level cache without TLB misses! Caches, TLBs, Vitual Memoy all undestood by examining how they deal with 4 questions 1) Whee can block be placed? 2) How is block found? 3) What block is eplaced on miss? 4) How ae wites handled? Today VM allows many pocesses to shae single memoy without having to swap all pocesses to disk; today VM potection is moe impotant than memoy hieachy benefits, but computes insecue Pepae fo debate + quiz on Wednesday 1/26/2009 CS252-S09, Lectue 02 93

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Gaduate Compute Achitectue Lectue 6 - Hazads Michela Taufe http://www.cis.udel.edu/~taufe/teaching/cis662f07 Powepoint Lectue Notes fom John Hennessy and David Patteson s: Compute Achitectue,

More information

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines 1 COEN-4730 Compute Achitectue Lectue 2 Review of nstuction Sets and Pipelines Cistinel Ababei Dept. of Electical and Compute Engineeing Maquette Univesity Cedits: Slides adapted fom pesentations of Sudeep

More information

COSC 6385 Computer Architecture. - Pipelining

COSC 6385 Computer Architecture. - Pipelining COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped

More information

Review from last lecture

Review from last lecture CSE820 Gaduate Compute Achitectue Week 3 Pefomance + Pipeline Review Based on slides by David Patteson Review fom last lectue Tacking and extapolating technology pat of achitect s esponsibility Expect

More information

Administrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20

Administrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20 Administivia CMSC 411 Compute Systems Achitectue Lectue 5 Basic Pipelining (cont.) Alan Sussman als@cs.umd.edu as@csu dedu Homewok poblems fo Unit 1 due today Homewok poblems fo Unit 3 posted soon CMSC

More information

CSE4201. Computer Architecture

CSE4201. Computer Architecture CSE 4201 Compute Achitectue Pof. Mokhta Aboelaze Pats of these slides ae taken fom Notes by Pof. David Patteson at UCB Outline MIPS and instuction set Simple pipeline in MIPS Stuctual and data hazads Fowading

More information

Introduction To Pipelining. Chapter Pipelining1 1

Introduction To Pipelining. Chapter Pipelining1 1 Intoduction To Pipelining Chapte 6.1 - Pipelining1 1 Mooe s Law Mooe s Law says that the numbe of pocessos on a chip doubles about evey 18 months. Given the data on the following two slides, is this tue?

More information

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1 Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams.

More information

Computer Science 141 Computing Hardware

Computer Science 141 Computing Hardware Compute Science 141 Computing Hadwae Fall 2006 Havad Univesity Instucto: Pof. David Books dbooks@eecs.havad.edu [MIPS Pipeline Slides adapted fom Dave Patteson s UCB CS152 slides and May Jane Iwin s CSE331/431

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.bekeley.edu/~cs61c UCB CS61C : Machine Stuctues Lectue SOE Dan Gacia Lectue 28 CPU Design : Pipelining to Impove Pefomance 2010-04-05 Stanfod Reseaches have invented a monitoing technique called

More information

The Processor: Improving Performance Data Hazards

The Processor: Improving Performance Data Hazards The Pocesso: Impoving Pefomance Data Hazads Monday 12 Octobe 15 Many slides adapted fom: and Design, Patteson & Hennessy 5th Edition, 2014, MK and fom Pof. May Jane Iwin, PSU Summay Pevious Class Pipeline

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hadwae Oganization and Design Lectue 16: Pipelining Adapted fom Compute Oganization and Design, Patteson & Hennessy, UCB Last time: single cycle data path op System clock affects pimaily the Pogam

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

CS 2461: Computer Architecture 1 Program performance and High Performance Processors

CS 2461: Computer Architecture 1 Program performance and High Performance Processors Couse Objectives: Whee ae we. CS 2461: Pogam pefomance and High Pefomance Pocessos Instucto: Pof. Bhagi Naahai Bits&bytes: Logic devices HW building blocks Pocesso: ISA, datapath Using building blocks

More information

Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture

Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture Compute Achitectue Pipelining and nstuction Level Paallelism An ntoduction Adapted fom COD2e by Hennessy & Patteson Slide 1 Outline of This Lectue ntoduction to the Concept of Pipelined Pocesso Pipelined

More information

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1 CMCS 611-101 Advanced Compute Achitectue Lectue 6 Intoduction to Pipelining Septembe 23, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Compute Achitectue 1 Pevious

More information

Memory Hierarchy Review

Memory Hierarchy Review EECS 252 Graduate Computer Architecture Lecture 3 0 (continued) Review of Caches and Virtual January 27 th, 20 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Geat Ideas in Compute Achitectue Pipelining Hazads Instucto: Senio Lectue SOE Dan Gacia 1 Geat Idea #4: Paallelism So9wae Paallel Requests Assigned to compute e.g. seach Gacia Paallel Theads Assigned

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Lecture #22 Pipelining II, Cache I

Lecture #22 Pipelining II, Cache I inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue #22 Pipelining II, Cache I Wiewold cicuits 2008-7-29 http://www.maa.og/editoial/mathgames/mathgames_05_24_04.html http://www.quinapalus.com/wi-index.html

More information

Chapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards)

Chapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards) Chapte 4 (Pat III) The Pocesso: Datapath and Contol (Pipeline Hazads) 陳瑞奇 (J.C. Chen) 亞洲大學資訊工程學系 Adapted fom class notes by Pof. M.J. Iwin, PSU and Pof. D. Patteson, UCB 1 吃感冒藥副作用怎麼辦? http://big5.sznews.com/health/images/attachement/jpg/site3/20120319/001558d90b3310d0c1683e.jpg

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory 1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Pre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly

Pre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly 332 Advanced Compute Achitectue Chapte 1 Intoduction and eview of Pipelines, Pefomance, Caches, and Vitual Januay 2009 Paul H J Kelly These lectue notes ae patly based on the couse text, Hennessy and Patteson

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu CENG 3420 Compute Oganization and Design Lectue 07: MIPS Pocesso - II Bei Yu CEG3420 L07.1 Sping 2016 Review: Instuction Citical Paths q Calculate cycle time assuming negligible delays (fo muxes, contol

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2010 Lecture 2: Review of Metrics and Pipelining 563 L02.1 Fall 2010 Review from Last Time Computer Architecture >> instruction sets Computer Architecture skill

More information

You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011

You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011 CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuction Level Paallelism: Multiple Instuction Issue Guest Lectue: Justin Hsia Softwae Paallel Requests Assigned to compute e.g., Seach Katz

More information

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4) PU Stuctue and Function h Geneal Oganisation Registes Instuction ycle Pipelining anch Pediction Inteupts Use Visible Registes Vaies fom one achitectue to anothe Geneal pupose egiste (GPR) ata, addess,

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism Agenda CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuc>on Level Paallelism Instuctos: Randy H. Katz David A. PaJeson hjp://inst.eecs.bekeley.edu/~cs61c/fa10 Review Instuc>on Set Design

More information

CENG 3420 Lecture 07: Pipeline

CENG 3420 Lecture 07: Pipeline CENG 3420 Lectue 07: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L07.1 Sping 2017 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CENG3420 L07.2 Sping

More information

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining EE 341 Lectue # 12 Instucto: Zeshan hishti zeshan@ece.pdx.edu Novembe 10, 2014 Potland State Univesity asic Pocessing Unit ontol Signals Hadwied ontol Datapath contol signals Dealing with memoy delay Pipelining

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Handout 4 Memory Hierarchy

Handout 4 Memory Hierarchy Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced

More information

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0. Since 1980, CPU has outpaced DRAM... EEL 5764: Graduate Computer Architecture Appendix C Hierarchy Review Ann Gordon-Ross Electrical and Computer Engineering University of Florida http://www.ann.ece.ufl.edu/

More information

Any modern computer system will incorporate (at least) two levels of storage:

Any modern computer system will incorporate (at least) two levels of storage: 1 Any moden compute system will incopoate (at least) two levels of stoage: pimay stoage: andom access memoy (RAM) typical capacity 32MB to 1GB cost pe MB $3. typical access time 5ns to 6ns bust tansfe

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

COSC 6385 Computer Architecture. - Memory Hierarchies (I) COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap

More information

CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue

CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue CS 61C: Geat Ideas in Compute Achitectue Instuc(on Level Paallelism: Mul(ple Instuc(on Issue Instuctos: Kste Asanovic, Randy H. Katz hbp://inst.eecs.bekeley.edu/~cs61c/fa12 1 Paallel Requests Assigned

More information

Lecture 2: Processor and Pipelining 1

Lecture 2: Processor and Pipelining 1 The Simple BIG Picture! Chapter 3 Additional Slides The Processor and Pipelining CENG 6332 2 Datapath vs Control Datapath signals Control Points Controller Datapath: Storage, FU, interconnect sufficient

More information

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Bell s Law new class per decade

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Bell s Law new class per decade EECS 252 Graduate Computer Architecture Lecture 2 0 Review of Instruction Sets and Pipelines January 23 th, 202 Review: Moore s Law John Kubiatowicz Electrical Engineering and Computer Sciences University

More information

Accelerating Storage with RDMA Max Gurtovoy Mellanox Technologies

Accelerating Storage with RDMA Max Gurtovoy Mellanox Technologies Acceleating Stoage with RDMA Max Gutovoy Mellanox Technologies 2018 Stoage Develope Confeence EMEA. Mellanox Technologies. All Rights Reseved. 1 What is RDMA? Remote Diect Memoy Access - povides the ability

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Inteconnection Netwoks fo HPC Systems Fall 2016 Avinash Kaanth Kodi School of Electical Engineeing and Compute Science Ohio Univesity Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement: Inteconnection

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics

More information

A Memory Efficient Array Architecture for Real-Time Motion Estimation

A Memory Efficient Array Architecture for Real-Time Motion Estimation A Memoy Efficient Aay Achitectue fo Real-Time Motion Estimation Vasily G. Moshnyaga and Keikichi Tamau Depatment of Electonics & Communication, Kyoto Univesity Sakyo-ku, Yoshida-Honmachi, Kyoto 66-1, JAPAN

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20

More information

CSE 502 Graduate Computer Architecture. Lec 6-7 Memory Hierarchy Review

CSE 502 Graduate Computer Architecture. Lec 6-7 Memory Hierarchy Review CSE 502 Graduate Computer Architecture Lec 6-7 Memory Hierarchy Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David Patterson,

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging

CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging CS162 Operating Systems and Systems Programming Lecture 14 Caching and Demand Paging October 17, 2007 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs162 Review: Hierarchy of a Modern Computer

More information

CS162 Operating Systems and Systems Programming Lecture 13. Caches and TLBs. Page 1

CS162 Operating Systems and Systems Programming Lecture 13. Caches and TLBs. Page 1 CS162 Operating Systems and Systems Programming Lecture 13 Caches and TLBs March 12, 2008 Prof. Anthony D. Joseph http//inst.eecs.berkeley.edu/~cs162 Review Multi-level Translation What about a tree of

More information

THE THETA BLOCKCHAIN

THE THETA BLOCKCHAIN THE THETA BLOCKCHAIN Theta is a decentalized video steaming netwok, poweed by a new blockchain and token. By Theta Labs, Inc. Last Updated: Nov 21, 2017 esion 1.0 1 OUTLINE Motivation Reputation Dependent

More information

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" October 1, 2012! Prashanth Mohan!! Slides from Anthony Joseph and Ion Stoica! http://inst.eecs.berkeley.edu/~cs162! Caching!

More information

CSE 502 Graduate Computer Architecture. Lec 5-6 Memory Hierarchy Review

CSE 502 Graduate Computer Architecture. Lec 5-6 Memory Hierarchy Review CSE 502 Graduate Computer Architecture Lec 5-6 Memory Hierarchy Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David Patterson,

More information

Module 6 STILL IMAGE COMPRESSION STANDARDS

Module 6 STILL IMAGE COMPRESSION STANDARDS Module 6 STILL IMAE COMPRESSION STANDARDS Lesson 17 JPE-2000 Achitectue and Featues Instuctional Objectives At the end of this lesson, the students should be able to: 1. State the shotcomings of JPE standad.

More information

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS Daniel A Menascé Mohamed N Bennani Dept of Compute Science Oacle, Inc Geoge Mason Univesity 1211 SW Fifth

More information

Multidimensional Testing

Multidimensional Testing Multidimensional Testing QA appoach fo Stoage netwoking Yohay Lasi Visuality Systems 1 Intoduction Who I am Yohay Lasi, QA Manage at Visuality Systems Visuality Systems the leading commecial povide of

More information

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Page 1. Review: Address Segmentation  Review: Address Segmentation  Review: Address Segmentation Review Address Segmentation " CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 23, 2011! Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! 1111 0000" 1110 000" Seg #"

More information

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC

More information

IP Network Design by Modified Branch Exchange Method

IP Network Design by Modified Branch Exchange Method Received: June 7, 207 98 IP Netwok Design by Modified Banch Method Kaiat Jaoenat Natchamol Sichumoenattana 2* Faculty of Engineeing at Kamphaeng Saen, Kasetsat Univesity, Thailand 2 Faculty of Management

More information

Persistent Memory what developers need to know Mark Carlson Co-chair SNIA Technical Council Toshiba

Persistent Memory what developers need to know Mark Carlson Co-chair SNIA Technical Council Toshiba Pesistent Memoy what developes need to know Mak Calson Co-chai SNIA Technical Council Toshiba 2018 Stoage Develope Confeence EMEA. All Rights Reseved. 1 Contents Welcome Pesistent Memoy Oveview Non-Volatile

More information

GCC-AVR Inline Assembler Cookbook Version 1.2

GCC-AVR Inline Assembler Cookbook Version 1.2 GCC-AVR Inline Assemble Cookbook Vesion 1.2 About this Document The GNU C compile fo Atmel AVR isk pocessos offes, to embed assembly language code into C pogams. This cool featue may be used fo manually

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012!  (0xE0)    (0x70)  (0x50) CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 22, 2011! Anthony D. Joseph and Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! Segmentation! Paging! Recap Segmentation

More information

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers XFVHDL: A Tool fo the Synthesis of Fuzzy Logic Contolles E. Lago, C. J. Jiménez, D. R. López, S. Sánchez-Solano and A. Baiga Instituto de Micoelectónica de Sevilla. Cento Nacional de Micoelectónica, Edificio

More information

High performance CUDA based CNN image processor

High performance CUDA based CNN image processor High pefomance UDA based NN image pocesso GEORGE VALENTIN STOIA, RADU DOGARU, ELENA RISTINA STOIA Depatment of Applied Electonics and Infomation Engineeing Univesity Politehnica of Buchaest -3, Iuliu Maniu

More information

Modeling a shared medium access node with QoS distinction

Modeling a shared medium access node with QoS distinction Modeling a shaed medium access node with QoS distinction Matthias Gies, Jonas Geutet Compute Engineeing and Netwoks Laboatoy (TIK) Swiss Fedeal Institute of Technology Züich CH-8092 Züich, Switzeland email:

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review CSE 502 Graduate Computer Architecture Lec 3-5 Performance + Instruction Pipelining Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from

More information

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives SPARK: Soot Reseach Kit Ondřej Lhoták Objectives Spak is a modula toolkit fo flow-insensitive may points-to analyses fo Java, which enables expeimentation with: vaious paametes of pointe analyses which

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. IR Basics. User Task. Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. IR Basics. User Task. Basic IR Processes CS630 Repesenting and Accessing Digital Infomation Infomation Retieval: Basics Thosten Joachims Conell Univesity Infomation Retieval Basics Retieval Models Indexing and Pepocessing Data Stuctues ~ 4 lectues

More information

CPE 631 Lecture 04: CPU Caches

CPE 631 Lecture 04: CPU Caches Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR

More information

1.3 Multiplexing, Time-Switching, Point-to-Point versus Buses

1.3 Multiplexing, Time-Switching, Point-to-Point versus Buses http://achvlsi.ics.foth.g/~kateveni/534 1.3 Multiplexing, Time-Switching, Point-to-Point vesus Buses n R m Aggegation (multiplexing) Distibution (demultiplexing) Simplest Netwoking, like simplest pogamming:

More information

Physical simulation for animation

Physical simulation for animation Physical simulation fo animation Case study: The jello cube The Jello Cube Mass-Sping System Collision Detection Integatos Septembe 17 2002 1 Announcements Pogamming assignment 3 is out. It is due Tuesday,

More information

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017! Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma apreduce Optimizations and Algoithms 2015 Pofesso Sasu Takoma www.cs.helsinki.fi Optimizations Reduce tasks cannot stat befoe the whole map phase is complete Thus single slow machine can slow down the

More information

Review from last lecture. EECS 252 Graduate Computer Architecture. Lec 4 Memory Hierarchy Review. Outline. Example Standard Deviation: Last time

Review from last lecture. EECS 252 Graduate Computer Architecture. Lec 4 Memory Hierarchy Review. Outline. Example Standard Deviation: Last time Review from last lecture EECS 252 Graduate Computer Architecture Lec 4 Hierarchy Review David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~pattrsn

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006

More information

Using SPEC SFS with the SNIA Emerald Program for EPA Energy Star Data Center Storage Program Vernon Miller IBM Nick Principe Dell EMC

Using SPEC SFS with the SNIA Emerald Program for EPA Energy Star Data Center Storage Program Vernon Miller IBM Nick Principe Dell EMC Using SPEC SFS with the SNIA Emeald Pogam fo EPA Enegy Sta Data Cente Stoage Pogam Venon Mille IBM Nick Pincipe Dell EMC v6 Agenda Backgound on SNIA Emeald/Enegy Sta fo block Intoduce NAS/File test addition;

More information

Lecture 1: Introduction

Lecture 1: Introduction Lecture 1: Introduction Dr. Eng. Amr T. Abdel-Hamid Winter 2014 Computer Architecture Text book slides: Computer Architec ture: A Quantitative Approach 5 th E dition, John L. Hennessy & David A. Patterso

More information

Prioritized Traffic Recovery over GMPLS Networks

Prioritized Traffic Recovery over GMPLS Networks Pioitized Taffic Recovey ove GMPLS Netwoks 2005 IEEE. Pesonal use of this mateial is pemitted. Pemission fom IEEE mu be obtained fo all othe uses in any cuent o futue media including epinting/epublishing

More information

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012 2011, Scienceline Publication www.science-line.com Jounal of Wold s Electical Engineeing and Technology J. Wold. Elect. Eng. Tech. 1(1): 12-16, 2012 JWEET An Efficient Algoithm fo Lip Segmentation in Colo

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review CSE 502 Graduate Computer Architecture Lec 3-5 Performance + Instruction Pipelining Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from

More information

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Computer Architecture and System

Computer Architecture and System Computer Architecture and System 2012 Chung-Ho Chen Computer Architecture and System Laboratory Department of Electrical Engineering National Cheng-Kung University Course Focus Understanding the design

More information

Page 1. CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging

Page 1. CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging CS162 Operating Systems and Systems Programming Lecture 14 Caching and Demand Paging March 4, 2010 Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Review: Hierarchy of a Modern Computer System Take advantage

More information

dc - Linux Command Dc may be invoked with the following command-line options: -V --version Print out the version of dc

dc - Linux Command Dc may be invoked with the following command-line options: -V --version Print out the version of dc - CentOS 5.2 - Linux Uses Guide - Linux Command SYNOPSIS [-V] [--vesion] [-h] [--help] [-e sciptexpession] [--expession=sciptexpession] [-f sciptfile] [--file=sciptfile] [file...] DESCRIPTION is a evese-polish

More information

A modal estimation based multitype sensor placement method

A modal estimation based multitype sensor placement method A modal estimation based multitype senso placement method *Xue-Yang Pei 1), Ting-Hua Yi 2) and Hong-Nan Li 3) 1),)2),3) School of Civil Engineeing, Dalian Univesity of Technology, Dalian 116023, China;

More information

Computer Architecture and System

Computer Architecture and System Computer Architecture and System Chung-Ho Chen Computer Architecture and System Laboratory Department of Electrical Engineering National Cheng-Kung University Course Focus Understanding the design techniques,

More information

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter

More information

IP Multicast Simulation in OPNET

IP Multicast Simulation in OPNET IP Multicast Simulation in OPNET Xin Wang, Chien-Ming Yu, Henning Schulzinne Paul A. Stipe Columbia Univesity Reutes Depatment of Compute Science 88 Pakway Dive South New Yok, New Yok Hauppuage, New Yok

More information

Configuring RSVP-ATM QoS Interworking

Configuring RSVP-ATM QoS Interworking Configuing RSVP-ATM QoS Intewoking Last Updated: Januay 15, 2013 This chapte descibes the tasks fo configuing the RSVP-ATM QoS Intewoking featue, which povides suppot fo Contolled Load Sevice using RSVP

More information