Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Size: px
Start display at page:

Download "Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1"

Transcription

1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive per bit tha slower memory Solutio: orgaize memory system ito a hierarchy Etire addressable memory space available i largest, slowest memory Icremetally smaller ad faster memories, each cotaiig a subset of the memory below it, proceed i steps up toward the processor Temporal ad spatial locality isures that early all refereces ca be foud i smaller memories Gives the illusio of a large, fast memory beig preseted to the processor Itroductio Copyright 2012, Elsevier Ic. All rights reserved. 2

2 Memory Performace Gap Itroductio Copyright 2012, Elsevier Ic. All rights reserved. 3 Memory Hierarchy Desig Memory hierarchy desig becomes more crucial with recet multi-core processors: Aggregate peak badwidth grows with # cores: Itel Core i7 ca geerate two refereces per core per clock Four cores ad 3.2 GHz clock 25.6 billio* 64-bit data refereces/secod billio* 128-bit istructio refereces = GB/s! DRAM badwidth is oly 6% of this (25 GB/s) Requires: Multi-port, pipelied caches Two levels of cache per core Shared third-level cache o chip Itroductio * US billio = 10 9 Copyright 2012, Elsevier Ic. All rights reserved. 4

3 The Memory Hierarchy The BIG Picture Commo priciples apply at all levels of the memory hierarchy Based o otios of cachig At each level i the hierarchy Block placemet Fidig a block Replacemet o a miss Write policy 5.5 A Commo Framework for Memory Hierarchies Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 5 Direct Mapped Cache Locatio determied by address Direct mapped: oly oe choice (Block address) modulo (#Blocks i cache) #Blocks is a power of 2 Use low-order address bits Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 6

4 Associative Caches Fully associative Allow a give block to go i ay cache etry Requires all etries to be searched at oce Comparator per etry (expesive) -way set associative Each set cotais etries Block umber determies which set (Block umber) modulo (#Sets i cache) Search all etries i a give set at oce comparators (less expesive) Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 7 How Much Associativity Icreased associativity decreases miss rate But with dimiishig returs Simulatio of a system with 64KB D-cache, 16-word blocks, SPEC way: 10.3% 2-way: 8.6% 4-way: 8.3% 8-way: 8.1% Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 8

5 Block Placemet Determied by associativity Direct mapped (1-way associative) Oe choice for placemet -way set associative choices withi a set Fully associative Ay locatio Higher associativity reduces miss rate Icreases complexity, cost, ad access time Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 9 Replacemet Policy Direct mapped: o choice Set associative Prefer o-valid etry, if there is oe Otherwise, choose amog etries i the set Least-recetly used (LRU) Choose the oe uused for the logest time Simple for 2-way, maageable for 4-way, too hard beyod that Radom Gives approximately the same performace as LRU for high associativity Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 10

6 Write Policy Write-through Update both upper ad lower levels Simplifies replacemet, but may require write buffer Write-back Update upper level oly Update lower level whe block is replaced Need to keep more state Virtual memory Oly write-back is feasible, give disk write latecy Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 11 Memory Hierarchy Basics sets => -way set associative Direct-mapped cache => oe block per set Fully associative => oe set Itroductio Writig to cache: two strategies Write-through Immediately update lower levels of hierarchy Write-back Oly update lower levels of hierarchy whe a updated block is replaced Both strategies use write buffer to make writes asychroous Copyright 2012, Elsevier Ic. All rights reserved. 12

7 Memory Hierarchy Basics CPU exec-time = (CPU clock-cycles + Mem stall-cycles ) Clock cycle time Itroductio CPU exec-time = (IC CPI CPU + Mem stall-cycles ) Clock cycle time Mem stall-cycles = IC... Miss rate... Mem accesses... Miss pealty... Copyright 2012, Elsevier Ic. All rights reserved. 13 Memory Hierarchy Basics CPU exec-time = (CPU clock-cycles + Mem stall-cycles ) Clock cycle time Itroductio Mem stall-cycles = IC Misses Istructio Miss Pealty Note1: miss rate/pealty are ofte differet for reads ad writes Note2: speculative ad multithreaded processors may execute other istructios durig a miss Reduces performace impact of misses Copyright 2012, Elsevier Ic. All rights reserved. 14

8 Cache Performace Example Give I-cache miss rate = 2% D-cache miss rate = 4% Miss pealty = 100 cycles Base CPI (ideal cache) = 2 Load & stores are 36% of istructios Miss cycles per istructio I-cache: D-cache: Actual CPI = 2 +?? +?? =?? Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 15 Cache Performace Example Give I-cache miss rate = 2% D-cache miss rate = 4% Miss pealty = 100 cycles Base CPI (ideal cache) = 2 Load & stores are 36% of istructios Miss cycles per istructio I-cache: = 2 D-cache: = 1.44 Actual CPI = = 5.44 Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 16

9 Memory Hierarchy Basics Miss rate Fractio of cache access that result i a miss Itroductio Causes of misses (3C s +1) Compulsory First referece to a block Capacity Blocks discarded ad later retrieved Coflict Program makes repeated refereces to multiple addresses from differet blocks that map to the same locatio i the cache Coherecy Differet processors should see same value i same locatio Copyright 2012, Elsevier Ic. All rights reserved. 17 The 3C s i diff cache sizes Itroductio Coflict Copyright 2012, Elsevier Ic. All rights reserved. 18

10 The cache coherece pb Processors may see differet values through their caches: Cetralized Shared-Memory Architectures Copyright 2012, Elsevier Ic. All rights reserved. 19 Cache Coherece Coherece All reads by ay processor must retur the most recetly writte value Writes to the same locatio by ay two processors are see i the same order by all processors (Coherece defies the behaviour of reads & writes to the same memory locatio) Cosistecy Whe a writte value will be retured by a read If a processor writes locatio A followed by locatio B, ay processor that sees the ew value of B must also see the ew value of A (Cosistecy defies the behaviour of reads & writes with respect to accesses to other memory locatios) Cetralized Shared-Memory Architectures Copyright 2012, Elsevier Ic. All rights reserved. 20

11 Eforcig Coherece Coheret caches provide: Migratio: movemet of data Replicatio: multiple copies of data Cache coherece protocols Directory based Sharig status of each block kept i oe locatio Soopig Each core tracks sharig status of each block Cetralized Shared-Memory Architectures Copyright 2012, Elsevier Ic. All rights reserved. 21 Memory Hierarchy Basics Six basic cache optimizatios: Larger block size Reduces compulsory misses Icreases capacity ad coflict misses, icreases miss pealty Larger total cache capacity to reduce miss rate Icreases hit time, icreases power cosumptio Higher associativity Reduces coflict misses Icreases hit time, icreases power cosumptio Multilevel caches to reduce miss pealty Reduces overall memory access time Givig priority to read misses over writes Reduces miss pealty Avoidig address traslatio i cache idexig Reduces hit time Itroductio Copyright 2012, Elsevier Ic. All rights reserved. 22

12 Multilevel Caches Primary cache attached to CPU Small, but fast Level-2 cache services misses from primary cache Larger, slower, but still faster tha mai memory Mai memory services L-2 cache misses Some high-ed systems iclude L-3 cache Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 23 Multilevel Cache Example Give CPU base CPI = 1, clock rate = 4GHz Miss rate/istructio = 2% Mai memory access time = 100s With just primary cache Miss pealty =??? = 400 cycles Effective CPI = 1 +??? = 9 Now add L-2 cache Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 24

13 Multilevel Cache Example Give CPU base CPI = 1, clock rate = 4GHz Miss rate/istructio = 2% Mai memory access time = 100s With just primary cache Miss pealty = 100s/0.25s = 400 cycles Effective CPI = = 9 Now add L-2 cache Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 25 Example (cot.) Now add L-2 cache Access time = 5s Global miss rate to mai memory = 0.5% Primary miss with L-2 hit Pealty = 5s/0.25s = 20 cycles Primary miss with L-2 miss Extra pealty = 400 cycles CPI = = 3.4 Performace ratio = 9/3.4 = 2.6 Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 26

14 Multilevel O-Chip Caches Itel Nehalem 4-core processor Per core: 32KB L1 I-cache, 32KB L1 D-cache, 512KB L2 cache 3-Level Cache Orgaizatio L1 caches (per core) L2 uified cache (per core) L3 uified cache (shared) Itel Nehalem /a: data ot available L1 I-cache: 32KB, 64-byte blocks, 4-way, approx LRU replacemet, hit time /a L1 D-cache: 32KB, 64-byte blocks, 8-way, approx LRU replacemet, write-back/ allocate, hit time /a 256KB, 64-byte blocks, 8-way, approx LRU replacemet, writeback/allocate, hit time /a 8MB, 64-byte blocks, 16-way, replacemet /a, write-back/ allocate, hit time /a AMD Optero X4 L1 I-cache: 32KB, 64-byte blocks, 2-way, approx LRU replacemet, hit time 3 cycles L1 D-cache: 32KB, 64-byte blocks, 2-way, approx LRU replacemet, write-back/ allocate, hit time 9 cycles 512KB, 64-byte blocks, 16-way, approx LRU replacemet, writeback/allocate, hit time /a 2MB, 64-byte blocks, 32-way, replace block shared by fewest cores, write-back/allocate, hit time 32 cycles Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 28

15 29 Itel ew cache approach with Skylake AJProeça, Advaced Architectures, MiEI, UMiho, 2017/ AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 Itel ew cache approach with Skylake

16 Te Advaced Optimizatios Reducig the hit time 1. Small & simple first-level caches 2. Way-predictio Icrease cache badwidth 3. Pipelied cache access 4. Noblockig caches 5. Multibaked caches Reducig the miss pealty 6. Critical word first 7. Mergig write buffers Reducig the miss rate 8. Compiler optimizatios Reducig the miss pealty or miss rate via parallelism 9. Hardware prefetchig of istructios ad data 10. Compiler-cotrolled prefetchig AJProeça, Advaced Architectures, MiEI, UMiho, 2017/ Small ad simple 1 st level caches Small ad simple first level caches Critical timig path: addressig tag memory, the comparig tags, the selectig correct set Direct-mapped caches ca overlap tag compare ad trasmissio of data Lower associativity reduces power because fewer cache lies are accessed Advaced Optimizatios Copyright 2012, Elsevier Ic. All rights reserved. 32

17 L1 Size ad Associativity Advaced Optimizatios Access time vs. size ad associativity Copyright 2012, Elsevier Ic. All rights reserved. 33 L1 Size ad Associativity Advaced Optimizatios Eergy per read vs. size ad associativity Copyright 2012, Elsevier Ic. All rights reserved. 34

18 2. Way Predictio To improve hit time, predict the way to pre-set mux Mis-predictio gives loger hit time Predictio accuracy > 90% for two-way > 80% for four-way I-cache has better accuracy tha D-cache First used o MIPS R10000 i mid-90s Used o ARM Cortex-A8 Exted to predict block as well Way selectio Icreases mis-predictio pealty Advaced Optimizatios Copyright 2012, Elsevier Ic. All rights reserved Pipeliig Cache Pipelie cache access to improve badwidth Examples: Petium: 1 cycle Petium Pro Petium III: 2 cycles Petium 4 Core i7: 4 cycles Advaced Optimizatios Icreases brach mis-predictio pealty Makes it easier to icrease associativity Copyright 2012, Elsevier Ic. All rights reserved. 36

19 4. Noblockig Caches Allow hits before previous misses complete Hit uder miss Hit uder multiple miss L2 must support this I geeral, processors ca hide L1 miss pealty but ot L2 miss pealty Advaced Optimizatios Copyright 2012, Elsevier Ic. All rights reserved Multibaked Caches Orgaize cache as idepedet baks to support simultaeous access ARM Cortex-A8 supports 1-4 baks for L2 Itel i7 supports 4 baks for L1 ad 8 baks for L2 Advaced Optimizatios Iterleave baks accordig to block address Copyright 2012, Elsevier Ic. All rights reserved. 38

20 6. Critical Word First, Early Restart Critical word first Request missed word from memory first Sed it to the processor as soo as it arrives Early restart Request words i ormal order Sed missed work to the processor as soo as it arrives Advaced Optimizatios Effectiveess of these strategies depeds o block size ad likelihood of aother access to the portio of the block that has ot yet bee fetched Copyright 2012, Elsevier Ic. All rights reserved Mergig Write Buffer Whe storig to a block that is already pedig i the write buffer, update write buffer Reduces stalls due to full write buffer Do ot apply to I/O addresses Advaced Optimizatios No write bufferig Write bufferig Copyright 2012, Elsevier Ic. All rights reserved. 40

21 8. Compiler Optimizatios Loop Iterchage Swap ested loops to access memory i sequetial order Advaced Optimizatios Blockig Istead of accessig etire rows or colums, subdivide matrices ito blocks Requires more memory accesses but improves locality of accesses Copyright 2012, Elsevier Ic. All rights reserved Hardware Prefetchig Fetch two blocks o miss (iclude ext sequetial block) Advaced Optimizatios Petium 4 Pre-fetchig Copyright 2012, Elsevier Ic. All rights reserved. 42

22 10. Compiler Prefetchig Isert prefetch istructios before data is eeded No-faultig: prefetch does t cause exceptios Advaced Optimizatios Register prefetch Loads data ito register Cache prefetch Loads data ito cache Combie with loop urollig ad software pipeliig Copyright 2012, Elsevier Ic. All rights reserved. 43 Summary Advaced Optimizatios Copyright 2012, Elsevier Ic. All rights reserved. 44

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition. Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 CPU-Memory Bottleeck Computer Architecture ELEC44 CPU Memory Lecture 8 Cache Dr. Hayde Kwok-Hay So Departmet of Electrical ad Electroic Egieerig Performace of high-speed computers is usually limited by

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Large ad Fast: Exploitig Memory Hierarchy Priciple of Locality Programs access a small proportio of their address space

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 5 Large ad Fast: Exploitig Memory Hierarchy Priciple of Locality Programs access a small proportio of their address space

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.

More information

Instruction and Data Streams

Instruction and Data Streams Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad

More information

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory! Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast

More information

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Uniprocessors. HPC Prof. Robert van Engelen

Uniprocessors. HPC Prof. Robert van Engelen Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

Multiprocessors. HPC Prof. Robert van Engelen

Multiprocessors. HPC Prof. Robert van Engelen Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

UH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu

UH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu UH-MEM: Utility-Based Hybrid Memory Maagemet Yag Li, Saugata Ghose, Jogmoo Choi, Ji Su, Hui Wag, Our Mutlu 1 Executive Summary DRAM faces sigificat techology scalig difficulties Emergig memory techologies

More information

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Cache Optimization Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Cache Misses On cache hit CPU proceeds normally On cache miss Stall the CPU pipeline

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S. Memory Hierarchy Lecture notes from MKP, H. H. Lee and S. Yalamanchili Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 Reading (2) 1 SRAM: Value is stored on a pair of inerting gates Very fast but

More information

CS61C : Machine Structures

CS61C : Machine Structures CS 61C L24 VM II (1) ist.eecs.berkele.edu/~cs61c/su5 CS61C : Machie Structures Lecture #24: VM II Address Mappig: Virtual Address: VPN offset 25-8-2 Ad Carle idex ito page table located i phsical memor

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local

More information

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

CS/ECE 3330 Computer Architecture. Chapter 5 Memory CS/ECE 3330 Computer Architecture Chapter 5 Memory Last Chapter n Focused exclusively on processor itself n Made a lot of simplifying assumptions IF ID EX MEM WB n Reality: The Memory Wall 10 6 Relative

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Lecture 1: Introduction and Fundamental Concepts 1

Lecture 1: Introduction and Fundamental Concepts 1 Uderstadig Performace Lecture : Fudametal Cocepts ad Performace Aalysis CENG 332 Algorithm Determies umber of operatios executed Programmig laguage, compiler, architecture Determie umber of machie istructios

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines This Uit: Damic Schedulig CSE 560 Computer Sstems Architecture Damic Schedulig Slides origiall developed b Drew Hilto (IBM) ad Milo Marti (Uiversit of Peslvaia) App App App Sstem software Mem CPU I/O Code

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Operating System Concepts. Operating System Concepts

Operating System Concepts. Operating System Concepts Chapter 4: Mass-Storage Systems Logical Disk Structure Logical Disk Structure Disk Schedulig Disk Maagemet RAID Structure Disk drives are addressed as large -dimesioal arrays of logical blocks, where the

More information

Reducing SDRAM Energy Consumption in Embedded Systems Λ

Reducing SDRAM Energy Consumption in Embedded Systems Λ Reducig SDRAM Eergy Cosumptio i Embedded Systems Λ Jelea Trajkovic, Alexader Veidebaum Uiversity of Califoria, Irvie fjeleat,alexvg@ics.uci.edu Abstract DRAM eergy cosumptio i embedded systems ca be very

More information

EEC 483 Computer Organization. Chapter 5.3 Measuring and Improving Cache Performance. Chansu Yu

EEC 483 Computer Organization. Chapter 5.3 Measuring and Improving Cache Performance. Chansu Yu EEC 483 Computer Organization Chapter 5.3 Measuring and Improving Cache Performance Chansu Yu Cache Performance Performance equation execution time = (execution cycles + stall cycles) x cycle time stall

More information

1. SWITCHING FUNDAMENTALS

1. SWITCHING FUNDAMENTALS . SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018 Fudametals of Chapter 1 Microprocessor ad Microcotroller Dr. Farid Farahmad Updated: Tuesday, Jauary 16, 2018 Evolutio First came trasistors Itegrated circuits SSI (Small-Scale Itegratio) to ULSI Very

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

K-NET bus. When several turrets are connected to the K-Bus, the structure of the system is as showns

K-NET bus. When several turrets are connected to the K-Bus, the structure of the system is as showns K-NET bus The K-Net bus is based o the SPI bus but it allows to addressig may differet turrets like the I 2 C bus. The K-Net is 6 a wires bus (4 for SPI wires ad 2 additioal wires for request ad ackowledge

More information

Memory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Memory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Memory Hierarchy Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Arquitectura de Computadores

Arquitectura de Computadores Arquitectura de Computadores Capítulo 2. Procesadores segmetados Based o the origial material of the book: D.A. Patterso y J.L. Heessy Computer Orgaizatio ad Desig: The Hardware/Software Iterface 4 th

More information

ECE4050 Data Structures and Algorithms. Lecture 6: Searching

ECE4050 Data Structures and Algorithms. Lecture 6: Searching ECE4050 Data Structures ad Algorithms Lecture 6: Searchig 1 Search Give: Distict keys k 1, k 2,, k ad collectio L of records of the form (k 1, I 1 ), (k 2, I 2 ),, (k, I ) where I j is the iformatio associated

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Tuesday 27 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Memory hierarchy

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Memory Hierarchy. Advanced Optimizations. Slides contents from:

Memory Hierarchy. Advanced Optimizations. Slides contents from: Memory Hierarchy Advanced Optimizations Slides contents from: Hennessy & Patterson, 5ed. Appendix B and Chapter 2. David Wentzlaff, ELE 475 Computer Architecture. MJT, High Performance Computing, NPTEL.

More information

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

State-space feedback 6 challenges of pole placement

State-space feedback 6 challenges of pole placement State-space feedbac 6 challeges of pole placemet J Rossiter Itroductio The earlier videos itroduced the cocept of state feedbac ad demostrated that it moves the poles. x u x Kx Bu It was show that whe

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12a Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

1&1 Next Level Hosting

1&1 Next Level Hosting 1&1 Next Level Hostig Performace Level: Performace that grows with your requiremets Copyright 1&1 Iteret SE 2017 1ad1.com 2 1&1 NEXT LEVEL HOSTING 3 Fast page loadig ad short respose times play importat

More information