MEMORY HIERARCHY DESIGN

Size: px
Start display at page:

Download "MEMORY HIERARCHY DESIGN"

Transcription

1 1 MEMORY HIERARCHY DESIGN Chapter 2

2 OUTLINE Introduction Basics of Memory Hierarchy Advanced Optimizations of Cache Memory Technology and Optimizations 2

3 READINGASSIGNMENT Important! 3 Read Appendix B

4 INTRODUCTION Memory Challenges Programmers want unlimited amounts of fast memory; however, fast memory is expensive Improvement of processor performance is much faster than memory! 10-25K X 30-80X Memory Gap! X 6-8X 4

5 INTRODUCTION Efficient Solution? Memory hierarchy Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor Entire addressable memory space available in largest, slowest memory Whyitworks? Spatial and Temporal Locality 5

6 INTRODUCTION Ideally one would desire an indefinitely large memory capacity such that any particular word would be immediately available. We are forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible A. W. Burks, H. H. Goldstine, and J. von Neumann

7 INTRODUCTION Memory hierarchy design becomes more crucial with recent multi-core processors (proportional to # cores) Intel Core i7 with four cores at 3.2 GHz Can generate two data references per core per clock cycle 25.6 billion 64-bit data references/second 12.8 billion 128-bit instruction references/sec GB/s! DRAM bandwidth is only 6% of this (25 GB/s)???? Several Caching Techniques Separate instruction and data cache at first level Multi-port, pipelined caches Two separate levels of cache per core Shared third-level cache on chip Memory performance and Power??? 7

8 BASICS OFMEMORYHIERARCHIES Cache Highest and fastest level in the hierarchy (SRAM) First introduced in research computers in early 1960s Upon memory reference by the processor The cache is checked first When the referenced item is found in the cache, a hit occurs and the lower level is not accessed When the referenced item is not found in the cache, a miss occurs, and the lower level is checked High cost to retrieve Retrieve adjacent words as well (block) 8

9 BASICS OFMEMORYHIERARCHIES Wheretoplaceblocksinthecache? Direct-mapped cache Oneblockper set The block is placed at the same one location each time it is brought the cache #Blocks in Cache = cache size/blocksize Cache block # =(block address) modulo (# of blocks in the cache) Fully-associative cache A blockcan go any wherein the cache (random,lru..) Has one set Set-associative cache The cache is dividedintosets of blocks n blocksin a set n-way set associative A blockis placedanywhere withinaset Cache Set # = Block Address MOD #Sets 9

10 BASICS OFMEMORYHIERARCHIES Wheretoplaceblocksinthecache? 10

11 BASICS OFMEMORYHIERARCHIES Example Cachesize=32KB Blocksize=32Bytes Byteaddress=8160 Block address = Byte address/#bytes = 8160/32 = 255 1) Fully-associative any place 2) Directmapped Cache Block#=255 MOD (32 KB/32B) = 255 3) 4-Way Set Associative Cache Cache Set #= 255 MOD (32KB/(4*32)) =255 11

12 BASICS OFMEMORYHIERARCHIES Howtofindablockinthecache? Store the address or part of it in the cache as well Tag : part of the memory address. Checked to know if the block contains the required information Identify and select block Select set Log(#sets) Select data within block Log(#bytes in block) Validbit Indicate whether the cache block has valid information 12

13 BASICS OFMEMORYHIERARCHIES 1 K Word Direct-mapped Cache(1 word/block) Byte offset Hit Tag Index IndexValid Tag Data Data 13

14 Hit Data BASICS OFMEMORYHIERARCHIES 4-Way Set-Associative Cache Byte offset Tag 22 Index 8 IndexV Tag Data V Tag Data V Tag Data V Tag Data 32 4x1 select 14

15 BASICS OFMEMORYHIERARCHIES The cache can t have all data found in lower level! Which block to remove from the cache? Direct mapped: no choice! Set and fully associative? Least-recently used (LRU) Choose the one unused for the longest time Simple for 2-way, manageable for 4-way, too hard beyond that Random Gives approximately the same performance as LRU for high associativity FIFO Approximates LRU 15

16 BASICS OFMEMORYHIERARCHIES Cachingdatathatisonlyreadiseasy! Writeis not alloweduntilthe validbit is checked Writes are of different size! Onwritehit: A write-through cache updates the item in the cache and writes through to update main memory A write-back cache only updates the copy in the cache When the block is about to be replaced, it is copied back to memory(dirty Bit!) Both strategies require long memory access Use write buffers Processor Cache DRAM write buffer 16

17 BASICS OFMEMORYHIERARCHIES What to do on a write miss? Write allocate fetch block into the cache. Has miss penalty like read miss! No-write allocate the block is modified only in the lower-level memory 17

18 BASICS OFMEMORYHIERARCHIES 18 Caching on Write?

19 BASICS OFMEMORYHIERARCHIES One measure of the benefits of different cache organizations is miss rate The fraction of cache accesses that result in a miss CausesofMiss Compulsory The firstaccesstoablockthat is not in the cache Capacity The cachecan not containall the blocks Conflict Multipleblocksmapsto the samelocationin cache Multithreading and multi processors add the fourth C which is Coherency 19

20 BASICS OFMEMORYHIERARCHIES A better measure is the average memory access time (AMAT) AMAT = Hit Time + Miss Rate x Miss Penalty Where hit time is the time to hit in the cache miss penalty is the time to replace the block from memory (lower level!) Average memory access time is still an indirect measure of performance but it is not a substitute for execution time Speculative and multithreaded processors may execute other instructions during a miss 20

21 BASICS OFMEMORYHIERARCHIES In terms of execution time, cache performance can be assessed using Where To accommodate for different miss penalty and rate for writes 21

22 BASICS OFMEMORYHIERARCHIES (1+0.5)?? 22

23 VIRTUALMEMORY Computers are running multiple processes concurrently Expensive to dedicate the entire space for a single process. Sharing!! Virtual memory Divides physical memory into blocks and allocates them to different processes Allows sharing and protection, dynamic management and code relocation 23

24 VIRTUALMEMORY Processors deal with virtual addresses (VA)! Why! NeedtotranslateVAtoPA A pagetable is loadedin memoryis used Expensive! Use Cache for translation (TLB). Note the cache is physically addressed? Why? 24

25 VIRTUALMEMORY Where Can a Block/page Be Placed in Main Memory? Fully associative How Is a Block Found If It Is in Main Memory? Page table Which Block Should Be Replaced on a Virtual Memory Miss? LRU by OS and the USE bit provided by the processor What Happens on a Write? Write-back! 25

26 BASICS OFMEMORYHIERARCHIES Cache optimization techniques Split Cache In pipelined processors, a processor may request data and instruction in the same cycle!? (Structural Hazard) Use split cache! Improves the bandwidth of the cache Can optimize each cache separately (associativity, block size and capacity) However, this fixes the cache size for each type! Check example on B-16 26

27 BASICS OFMEMORYHIERARCHIES Example(Split Cache) Split Cache Unified Cache Instruction Cache miss rate 1% Miss rate 2.5% Data Cache miss rate 2% Miss penalty 200 cycles 30% Memory instructions Solution T Uni = IC ( ) CC = 7.5 IC CC T split = IC ( ) CC = 4.2 IC CC Speedup = T uni / T split = 7.5/4.2 =

28 BASICS OFMEMORYHIERARCHIES Cache optimization techniques Larger block size Reduces miss rate Reduces compulsory misses Reduce static power However, increases Miss penalty Capacity and conflict misses 28 Check Example on p. B-26

29 BASICS OFMEMORYHIERARCHIES Cache optimization techniques Bigger caches to reduce miss rate Cost?? Increases hit time Increases static and dynamic power consumption Higher associativity to reduce miss rate Reduces conflict misses Increases hit time Increases power consumption Example p. B-29 29

30 BASICS OFMEMORYHIERARCHIES Cache optimization techniques Multilevel caches to reduce miss penalty Reduces overall memory access time More power efficient AMAT? Multilevel inclusion and exclusion! 30

31 BASICS OFMEMORYHIERARCHIES Cache optimization techniques Giving priority to read misses over writes Reduces miss penalty Little impact on power Avoiding address translation in cache indexing Processors work with virtual addresses while caches use physical Translation is required before accessing the cache? Page Table! Reduces hit time Use TLB! 31

32 ADVANCED OPTIMIZATIONS OF CACHE Factors that are considered Hit time, Miss rate, Miss penalty, Bandwidth and power Techniques can be classified Category Technique Impacton power Reducing the hit time Increasing cache bandwidth Reducing the miss penalty Small and simple first-level caches Way prediction Pipelined caches Multibanked caches None-blocking caches Critical word first Merging write buffers Decrease Varying Little Reducing the miss rate Compiler optimizations Improves Reducing the miss penalty or miss rate via parallelism Hardware prefetching Compiler prefetching Increase 32

33 ADVANCED OPTIMIZATIONS OF CACHE 1. Small and Simple First-Level Caches The total amount of on-chip cache has increased dramatically However, the amount of L1 cache has recently increased eitherslightly or not at all Faster clocks demand faster and smaller caches Yet, smaller caches increase capacity and conflict misses Alternatively, designers have opted for more associativity than larger caches However, power should be considered in this case! 33

34 BASICS OFMEMORYHIERARCHIES Cache size and Hit time 34

35 BASICS OFMEMORYHIERARCHIES Cache size and energy 35

36 ADVANCED OPTIMIZATIONS OF CACHE 2. Way Prediction Extra bits are used to predict the way or the block within the set of the next cache access The MUX is preset and one tag comparisonis performed A miss results in checking the other blocks for matches in the next clock cycle Accuracy > 90% for two-way (more popular) > 80% for four-way I-cache has better accuracy than D-cache First used on MIPS R10000 in mid-90s Used on ARM Cortex-A8 Extension Way selection! Saves power when correct! Increases miss-prediction penalty Example on P.82 36

37 ADVANCED OPTIMIZATIONS OF CACHE 3. Pipelined Caches Pipeline cache access such that the access of the firstlevel cache is over multiple cycles Gives fast clock cycle time and high bandwidth but slow hits Examples Pentium: 1 cycle Pentium Pro Pentium III: 2 cycles Pentium 4 Core i7: 4 cycles Increases branch miss-prediction penalty Makes it easier to increase associativity!? 37

38 ADVANCED OPTIMIZATIONS OF CACHE 4. Nonblocking Caches For pipelined computers that allow out-of-order execution the processor need not stall to on D-cache miss The processor can continue to fetch instructions from I- cache while waiting the D-cache to return missing data A nonblocking cache allows D-cache to continue to supply cache hits during a miss to increase bandwidth Two techniques Hit Under Miss. Processor keeps running until another miss occurs. Simple implementation and reduces miss penalty Hit Under Multiple Misses. Overlap multiple misses (Intel Core i7). Beneficial only if the memory system is multi-banked and can service multiple misses (parallel/pipelined). Complex! Performance evaluation! In general, processors can hide L1 miss penalty but not L2 miss penalty 38

39 ADVANCED OPTIMIZATIONS OF CACHE 4. Nonblocking Caches Cache access latency improvement 39

40 ADVANCED OPTIMIZATIONS OF CACHE 4. Nonblocking Caches Example. For a 32 KB data cache that implements one hit under miss, the cache latency is 85% of the direct mapped cache. Given the numbers below, would a 2-way set associative cache has better latency than the hit under one miss? Assume L2 miss penaltyof10 cycles. Solution Direct 2-way Miss rate 5.2% 4.9% AMAT DM = *10=1.52 AMAT 2-Way = *10=1.49 AMAT 2-Way /AMAT DM = 1.49/1.52 = 98% Hitunderonemissisbetter! 40

41 ADVANCED OPTIMIZATIONS OF CACHE 5. Multibanked Caches Divide the cache into independent banks to allow simultaneous access (originally used in DRAM) ARM Cortex-A8 supports 1-4 banks for L2 Intel i7 supports 4 banks for L1 and 8 banks for L2 Spread the addresses of the block sequentially across the banks (sequential interleaving) Banking works best when the accesses naturally spread themselves across the banks Multiple banks also are a way to reduce power! 41

42 ADVANCED OPTIMIZATIONS OF CACHE 6. Critical Word First and Early Restart Generally, a processor needs one word from a block Two strategies to reduce miss penalty Critical word first request the missed word from memory and send it to the processor. The processor continues filling the rest of the block Early restart fetch all words in the block normally, but resume execution as soon as the requested word arrives Beneficial with large blocks! Issues Sizeof block! Effect of spatial locality! Bandwidth of memory system Calculation of miss penalty! 42

43 ADVANCED OPTIMIZATIONS OF CACHE 7. Merging Write Buffer Both write-through and write-back caches use write buffer to reduce miss penalty Write merging - when storing to a block that is already pending in the write buffer, update write buffer If the buffer is full and there is no address match, the cache (and processor) must wait until the buffer has an empty entry Reduces stalls due to write buffer being full No Write Merge With Write Merge 43

44 ADVANCED OPTIMIZATIONS OF CACHE 8. Compiler Optimization Apply optimization techniques at compile time to take advantage of memory hierarchy Applied to improve instruction misses and data misses Two examples Loop Interchange Blocking 44

45 ADVANCED OPTIMIZATIONS OF CACHE 8. Compiler Optimization (Loop interchange) Exchange the order of nested loops to allow sequential access to datain the way they are stored. Assuming arrays are big to fit in cache, this techniques reduces misses by improving spatial locality Example. Let x be a two-dimensional array of size [5000,100] allocated so that x[i,j] and x[i,j+1] are adjacent (row major). Block = 4 words. Ignore conflict and compulsory misses. Memory access skip through memory in strides of 100 words Misses!? 45

46 ADVANCED OPTIMIZATIONS OF CACHE 8. Compiler Optimization (Blocking) Blocked algorithms operate on submatrices or blocks to improve temporal locality! Example. Multiplication of y and z to obtain x /* Before */ for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1) {r = 0; for (k = 0; k < N; k = k+1){ r = r + y[i][k]*z[k][j];}; x[i][j] = r; }; Two Inner Loops: Read all NxNelements of z[ ] Read N elements of 1 row of y[ ] repeatedly Write N elements of 1 row of x[ ] Capacity Misses can be represented as a function of N & Cache Size: 3xNxN => no capacity misses 46

47 ADVANCED OPTIMIZATIONS OF CACHE 8. Compiler Optimization (Blocking) Example - continued Use a block size of BxB /* After */ for (jj = 0; jj < N; jj = jj+b) for (kk = 0; kk < N; kk = kk+b) for (i = 0; i < N; i = i+1) for (j = jj; j < min(jj+b-1,n); j = j+1) {r = 0; for (k = kk; k < min(kk+b-1,n); k = k+1) { r = r + y[i][k]*z[k][j];}; x[i][j] = x[i][j] + r; }; B is called the Blocking Factor 47

48 ADVANCED OPTIMIZATIONS OF CACHE 8. Compiler Optimization Loop fusion Array merging 48

49 ADVANCED OPTIMIZATIONS OF CACHE 9. Hardware Prefetching Prefetch instruction and data to reduce miss penalty or miss rate Prefetch instructions and/or data directly into the cache or into an external buffer (stream buffer) Instruction prefetch is frequently done in hardware outside of the cache On a miss, the processor fetches two blocks; the requested block and the next consecutive block The requested block is placed in the instruction cache and the prefetched block is placed into the instruction stream buffer The Intel Core i7 supports hardware prefetching into both L1 and L2 Prefetching relies on utilizing memory bandwidth that otherwise would be unused Power! 49

50 ADVANCED OPTIMIZATIONS OF CACHE 9. Hardware Prefetching 50

51 ADVANCED OPTIMIZATIONS OF CACHE 10. Compiler-controlled Prefetching Reduces miss rate or miss penalty! The compiler inserts prefetch instructions to request data before the processor needs it Goal Overlap execution with the prefetching of data (useful in Loops) Useful when The processor is able to proceed during fetch The cache does not stall while data is being prefetched! Two flavors Register prefetch will load the value into a register. Cache prefetch loads data only into the cache and not the register Issuing prefetch instructions incurs an instruction overhead that might outweigh the benefits 51

52 ADVANCED OPTIMIZATIONS OF CACHE 10. Compiler-controlled Prefetching Example(p. 93). Consider the following code. Assume 8 KB direct-mapped write-back (allocate) cache with 16 byte blocks. Arrays a and b are 3x100 and 101x3 double precision (elements 8 bytes long) and stored as rowmajor. Which accesses causes misses? Insert prefetch instructions to reduce misses. Calculate the number of prefetch instructions executed and the misses avoided by prefetching. 52

53 ADVANCED OPTIMIZATIONS OF CACHE 10. Compiler-controlled Prefetching Example(p.93). Access that cause misses array a access is in the same order of storage. Since the block size is 2, then even j misses while odd j hits 3x100/2 misses array b access is column wise b benefits twice from temporal locality we will miss all elements in b 101 misses Totalmissesfortheloop

54 ADVANCED OPTIMIZATIONS OF CACHE 10. Compiler-controlled Prefetching Example(p.93). Insert prefetch instructions to reduce misses Ignore prefetching at the beginningand the end of the loop 19 misses only! 54

55 ADVANCED OPTIMIZATIONS OF CACHE 55

56 MEMORY TECHNOLOGY AND OPTIMIZATIONS Performance measures of main memory emphasize latency and bandwidth. Latency is concern for cache Access time (time between word request and its arrival) Cycle time (time between unrelated requests to memory) Bandwidth is concern of multiprocessor and I/O Generally, it is easier to improve bandwidth than latency Multilevel caches and larger blocks make BW important to caches Cache optimizations reduced the processor memory gap but did not eliminate it Innovations started happening inside the DRAM chips DRAMs for main memory and SRAM for caches 56

57 MEMORY TECHNOLOGY AND OPTIMIZATIONS SRAM Technology 6 transistors/bit Used in the three levels of caches SRAMs don t need to refresh! The access time is very close to the cycle time Faster than DRAM (3-5 times) Low power to retain bit in standby mode Low density and more expensive! 57

58 MEMORY TECHNOLOGY AND OPTIMIZATIONS DRAM Technology One transistor/bit Reading a bit destroys it! Must be re-written after being read! Must also be periodically refreshed to prevent loss Bits in the same row can be refreshed simultaneously Every ~ 8 ms Address lines are multiplexed to reduce address lines Upper half of address: Row Access Strobe (RAS) Lower half of address: Column Access Strobe (CAS) DRAMs are commonly sold on small boards called dual inline memory modules (DIMMs). DIMMs typically contain 4 to 16 DRAMs chips normally organized to be 8 bytes wide 58

59 MEMORY TECHNOLOGY AND OPTIMIZATIONS DRAM Technology Amdahl rule of thumb Memory capacity should grow linearly with processor speed Unfortunately, memory capacity and speed has not kept pace with processors Performance Improvement 5% improvement in RAS (Latency) 12% improvement in CAS (BW) Capacity DRAMs has been a slowing down in capacity growth DRAMs obeyed Moore s law for 20 years (4x/3 years) 2x/2 years since x/4 years

60 MEMORY TECHNOLOGY AND OPTIMIZATIONS DRAM Technology 60

61 MEMORY TECHNOLOGY AND OPTIMIZATIONS DRAM Technology 61

62 MEMORY TECHNOLOGY AND OPTIMIZATIONS DRAM Optimizations Repeated access to the row buffer without another row access time Synchronous DRAM (SDRAM) Add a clock signal to the DRAM to avoid synchronization overhead Burst mode with critical word first Wider interfaces From 4-bit transfer mode to up to 16-bit Double Data Rate (DDR) Transfer data on both the rising edge and falling edge of the DRAM clock signal DDR, DDR2, DDR3 and DDR4 (voltage and clock), DDR5 Multiple banks on each DRAM device Banks accessed independently to improve BW 62

63 MEMORY TECHNOLOGY AND OPTIMIZATIONS Graphics DRAM GDRAM) Based on SDRAM designs but tailored for handling the higher BW demands of graphics processing units. GDDR5 is based on DDR3 with earlier GDDRs based on DDR2 GDDRs have wider interfaces: 32-bits versus 4, 8, or 16 in current designs Higher maximum clock rate on the data pins 2-5 X bandwidth per DRAM versus DDR3 DRAMs 63

64 MEMORY TECHNOLOGY AND OPTIMIZATIONS Reducing Power in SDRAM Lower operating voltage ( V) Banking Power down mode Ignore the clock Disables the clock except for internal automatic refresh 64

65 MEMORY TECHNOLOGY AND OPTIMIZATIONS Flash Memory Type of EEPROM (Non volatile) Must be erased in blocks before being overwritten (flash) Limited number of write cycles (100,000) Cheaper than SDRAM, more expensive than disk $2/GB for Flash, $20 to $40/GB for SDRAM, and $0.09/GB for disks Slower than SRAM, faster than disk 65

66 MEMORY TECHNOLOGY AND OPTIMIZATIONS Enhancing Memory Dependability Large caches and main memories significantly increase the possibility of errors Two types of errors Soft or dynamic errors change in cell s content Hard errors change in circuitry during fabrication or operation Soft errors can be detected and fixed by ECC Hard errors can be accommodated using spare rows that can be programmed 66

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

CSE 502 Graduate Computer Architecture

CSE 502 Graduate Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 CSE 502 Graduate Computer Architecture Lec 11-14 Advanced Memory Memory Hierarchy Design Larry Wittie Computer Science, StonyBrook

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

COSC 6385 Computer Architecture - Memory Hierarchies (III)

COSC 6385 Computer Architecture - Memory Hierarchies (III) COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Introduction to cache memories

Introduction to cache memories Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal

More information

COSC 6385 Computer Architecture - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity

More information

Memories. CPE480/CS480/EE480, Spring Hank Dietz.

Memories. CPE480/CS480/EE480, Spring Hank Dietz. Memories CPE480/CS480/EE480, Spring 2018 Hank Dietz http://aggregate.org/ee480 What we want, what we have What we want: Unlimited memory space Fast, constant, access time (UMA: Uniform Memory Access) What

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics ,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also

More information

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Cache performance Outline

Cache performance Outline Cache performance 1 Outline Metrics Performance characterization Cache optimization techniques 2 Page 1 Cache Performance metrics (1) Miss rate: Neglects cycle time implications Average memory access time

More information

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995 Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

Advanced Computer Architecture- 06CS81-Memory Hierarchy Design

Advanced Computer Architecture- 06CS81-Memory Hierarchy Design Advanced Computer Architecture- 06CS81-Memory Hierarchy Design AMAT and Processor Performance AMAT = Average Memory Access Time Miss-oriented Approach to Memory Access CPIExec includes ALU and Memory instructions

More information

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations DECstation 5 Miss Rates Cache Performance Measures % 3 5 5 5 KB KB KB 8 KB 6 KB 3 KB KB 8 KB Cache size Direct-mapped cache with 3-byte blocks Percentage of instruction references is 75% Instr. Cache Data

More information

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache Memory Cache Memory Locality cpu cache memory Memory hierarchies take advantage of memory locality. Memory locality is the principle that future memory accesses are near past accesses. Memory hierarchies

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

CMSC 611: Advanced Computer Architecture. Cache and Memory

CMSC 611: Advanced Computer Architecture. Cache and Memory CMSC 611: Advanced Computer Architecture Cache and Memory Classification of Cache Misses Compulsory The first access to a block is never in the cache. Also called cold start misses or first reference misses.

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers

More information

Lecture 11. Virtual Memory Review: Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity

More information

Types of Cache Misses: The Three C s

Types of Cache Misses: The Three C s Types of Cache Misses: The Three C s 1 Compulsory: On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. 2 Capacity: Occur

More information

MEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming

MEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming MEMORY HIERARCHY DESIGN B649 Parallel Architectures and Programming Basic Optimizations Average memory access time = Hit time + Miss rate Miss penalty Larger block size to reduce miss rate Larger caches

More information

Chapter-5 Memory Hierarchy Design

Chapter-5 Memory Hierarchy Design Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Who Cares About the Memory Hierarchy? Processor Only Thus

More information

Cache Optimisation. sometime he thought that there must be a better way

Cache Optimisation. sometime he thought that there must be a better way Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching

More information

Advanced optimizations of cache performance ( 2.2)

Advanced optimizations of cache performance ( 2.2) Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache performance 4 Cache

More information

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B. Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.5) Memory Technologies Dynamic Random Access Memory (DRAM) Optimized

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

Advanced cache optimizations. ECE 154B Dmitri Strukov

Advanced cache optimizations. ECE 154B Dmitri Strukov Advanced cache optimizations ECE 154B Dmitri Strukov Advanced Cache Optimization 1) Way prediction 2) Victim cache 3) Critical word first and early restart 4) Merging write buffer 5) Nonblocking cache

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff Lecture 20: ory Hierarchy Main ory and Enhancing its Performance Professor Alvin R. Lebeck Computer Science 220 Fall 1999 HW #4 Due November 12 Projects Finish reading Chapter 5 Grinch-Like Stuff CPS 220

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Memory Hierarchy. Advanced Optimizations. Slides contents from:

Memory Hierarchy. Advanced Optimizations. Slides contents from: Memory Hierarchy Advanced Optimizations Slides contents from: Hennessy & Patterson, 5ed. Appendix B and Chapter 2. David Wentzlaff, ELE 475 Computer Architecture. MJT, High Performance Computing, NPTEL.

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.) THE MEMORY SYSTEM SOME BASIC CONCEPTS Maximum size of the Main Memory byte-addressable CPU-Main Memory Connection, Processor MAR MDR k -bit address bus n-bit data bus Memory Up to 2 k addressable locations

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 5 Memory Hierarchy Design CA Lecture10 - memory hierarchy design (cwliu@twins.ee.nctu.edu.tw) 10-1 Outline 11 Advanced Cache Optimizations Memory Technology and DRAM

More information

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Topics. Digital Systems Architecture EECE EECE Need More Cache? Digital Systems Architecture EECE 33-0 EECE 9-0 Need More Cache? Dr. William H. Robinson March, 00 http://eecs.vanderbilt.edu/courses/eece33/ Topics Cache: a safe place for hiding or storing things. Webster

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance:

Classification Steady-State Cache Misses: Techniques To Improve Cache Performance: #1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

EE414 Embedded Systems Ch 5. Memory Part 2/2

EE414 Embedded Systems Ch 5. Memory Part 2/2 EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information