Memory Hierarchy Design

Size: px
Start display at page:

Download "Memory Hierarchy Design"

Transcription

1 Memory Hierarchy Design

2 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 2

3 Many Levels in Memory Hierarchy Invisible only to high-level language programmers Usually made invisible to the programmer (even assembly programmers) Pipeline registers Register file 1st-level cache (on-chip) 2nd-level cache (on same MCM as CPU) Physical memory There can also be a 3rd (or more) cache levels here (usu. mounted on same board as CPU) Our focus in chapter 5 Virtual memory (on hard disk, often in same enclosure as CPU) Disk files (on hard disk often in same enclosure as CPU) Network-accessible disk files (often in the same building as the CPU) Tape backup/archive system (often in the same building as the CPU) Data warehouse: Robotically-accessed room full of shelves of tapes (usually on the same planet as the CPU) 3

4 Simple Hierarchy Example Note many orders of magnitude change in characteristics between levels: ,000 4

5 CPU vs. Memory Performance Trends Relative performance (vs perf.) as a function of year +55%/year +35%/year +7%/year 5

6 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 6

7 Cache Basics A cache is a (hardware managed) storage, intermediate in size, speed, and cost-per-bit between the programmer-visible registers and main physical memory. The cache itself may be SRAM or fast DRAM. There may be >1 levels of caches Basis for cache to work: Principle of Locality When a location is accessed, it and nearby locations are likely to be accessed again soon. Temporal locality - Same location likely again soon. Spatial locality - Nearby locations likely soon. 7

8 Four Basic Questions Consider levels in a memory hierarchy. Use block for the unit of data transfer; satisfy Principle of Locality. Locality Transfer between cache levels, and the memory The level design is described by four behaviors Block Placement: Where Block Identification: How Block is a block found if it is in the level? Replacement: Which Write could a new block be placed in the level? existing block should be replaced if necessary? Strategy: How are writes to the block handled? 8

9 Block Placement Schemes 9

10 Direct-Mapped Placement A block can only go into one frame in the cache Determined by block s address (in memory space) Frame number usually given by some low-order bits of block address. This can also be expressed as: (Frame number) = (Block address) mod (Number of frames (sets) in cache) Note that in a direct-mapped cache, block placement & replacement are both completely determined by the address of the new block that is to be accessed. 10

11 Direct-Mapped Identification Tags Block frames Address Tag Frm# Off. Decode & Row Select One Selected &Compared Compare Tags? Hit Mux select Data Word 11

12 Fully-Associative Placement One alternative to direct-mapped is: Allow block to fill any empty frame in the cache. How do we then locate the block later? Can associate each stored block with a tag Identifies the block s location in cache. When the block is needed, treat the cache as an associative memory, using the tag to match all frames in parallel, to pull out the appropriate block. Another alternative to direct-mapped is placement under full program control. A register file can be viewed as a small programmercontrolled cache (w. 1-word blocks). 12

13 Fully-Associative Identification Block addrs Block frames Address Block addr Off. Parallel Compare & Select Note that, compared to Direct: More address bits have to be stored with each block frame. A comparator is needed for each frame, to do the parallel associative lookup. Hit Mux select Data Word 13

14 Set-Associative Placement The block address determines not a single frame, but a frame set (several frames, grouped together). Frame set # = Block address mod # of frame sets The block can be placed associatively anywhere within that frame set. If there are n frames in each frame set, the scheme is called n-way set-associative. Direct mapped = 1-way set-associative. Fully associative: There is only 1 frame set. 14

15 Set-Associative Identification Tags Block frames Address Tag Set# Off. Note: 4 separate sets Set Select Parallel Compare within the Set Intermediate between directmapped and fully-associative in number of tag bits needed to be associated with cache frames. Still need a comparator for each frame (but only those in one set need be activated). Hit Mux select Data Word 15

16 Cache Size Equation Simple equation for the size of a cache: (Cache size) = (Block size) (Number of sets) (Set Associativity) Can relate to the size of various address fields: (Block size) = 2(# of offset bits) (Number (# of sets) = 2(# of index bits) of tag bits) = (# of memory address bits) address (#Memory of index bits) (# of offset bits) 16

17 Replacement Strategies Which block do we replace when a new block comes in (on cache miss)? Direct-mapped: There s only one choice! Associative (fully- or set-): If any frame in the set is empty, pick one of those. Otherwise, Random: there are many possible strategies: Simple, fast, and fairly effective Least-Recently Used (LRU), and approximations thereof Require bits to record replacement info., e.g. 4-way requires 4! = 24 permutations, need 5 bits to define the MRU to LRU positions FIFO: Replace the oldest block. 17

18 Write Strategies Most accesses are reads, not writes Especially if instruction reads are included Optimize for reads performance matters Direct mapped can return value before valid check Writes are more difficult Can t write to cache till we know the right block Object written may have various sizes (1-8 bytes) When to synchronize cache with memory? Write through - Write to cache and to memory Prone Write to stalls due to high bandwidth requirements back - Write to memory upon replacement Memory may be out of date 18

19 Another Write Strategy Maintain a FIFO queue (write buffer) of cache frames (e.g. can use a doubly-linked list) Meanwhile, take items from top of queue and write them to memory as fast as bus can handle Reads might take priority, or have a separate bus Advantages: Write stalls are minimized, while keeping memory as up-to-date as possible 19

20 Write Miss Strategies What do we do on a write to a block that s not in the cache? Two main strategies: Both do not stop processor Write-allocate (fetch on write) - Cache the block. No write-allocate (write around) - Just write to memory. Write-back caches tend to use write-allocate. White-through tends to use no-write-allocate. Use dirty bit to indicate write-back is needed in write-back strategy 20

21 Example Alpha KB, 2-way, 64-byte block, 512 sets 44 physical address bits 21

22 Instruction vs. Data Caches Instructions and data have different patterns of temporal and spatial locality Also instructions are generally read-only Can have separate instruction & data caches Advantages Doubles bandwidth between CPU & memory hierarchy Each cache can be optimized for its pattern of locality Disadvantages Slightly more complex design Can t dynamically adjust cache space taken up by instructions vs. data 22

23 I/D Split and Unified Caches Size I-Cache D-Cache Unified Cache 8KB KB KB KB KB KB Miss per 1000 accesses Much lower instruction miss rate than data miss rate 23

24 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 24

25 Cache Performance Equations Memory stalls per program (blocking cache): MemoryStal lcycle IC ( MemoryAccesses ) MissRate MissPenalt y Instructio n Misses MemoryStallCycle IC ( ) MissPenalty Instruction CPU time formula: CPU Time IC (CPI Execu Memory Stall Cycle ) Cycle Time Instruction More cache performance will be given later! 25

26 Cache Performance Example Ideal CPI=2.0, memory references / inst=1.5, cache size=64kb, miss penalty=75ns, hit time=1 clock cycle Compare performance of two caches: Direct-mapped (1-way): cycle time=1ns, miss rate=1.4% 2-way: cycle time=1.25ns, miss rate=1.0% Miss Penalty1 way Miss Penalty 2 way 75ns 75 Cycles 1ns 75ns 60 Cycles 1.25ns CPU Time1 way IC ( % ) 1ns IC CPU Time 2 way IC (2.0 1% ) 1.25ns IC 26

27 Out-Of-Order Processor Define new miss penalty considering overlap Mem. Stall Cycles Misses (Total miss latency Overlap miss latency) Instruction Instruction Compute memory latency and overlapped latency Example (from previous slide) Assume 30% of 75ns penalty can be overlapped, but with longer (1.25ns) cycle on 1-way design due 75ns 70% tomiss OOO Penalty 42 Cycles 1 way 1.25ns CPU Time1 way,ooo IC ( % ) 1.25ns 3.60 IC 27

28 Cache Performance Improvement Consider the cache performance equation: (Average memory access time) = (Hit time) + (Miss rate) (Miss penalty) Amortized miss penalty It obviously follows that there are three basic ways to improve cache performance: Reducing miss penalty (5.4) Reducing miss rate (5.5) Reducing miss penalty/rate via parallelism (5.6) Reducing hit time (5.7) Note that by Amdahl s Law, there will be diminishing returns from reducing only hit time or amortized miss penalty by itself, instead of both together. 28

29 Cache Performance Improvement Reduce miss penalty: Multilevel cache; Critical word first and early restart; priority to read miss; Merging write buffer; Victim cache Reduce miss rate: Larger block size; Increase cache size; Higher associativity; Way prediction and Pseudo-associative caches; Compiler optimizations; Reduce miss penalty/rate via parallelism: Non-blocking cache; Hardware prefetching; Compilercontrolled prefetching; Reduce hit time: Small simple cache; Avoid address translation in indexing cache; Pipelined cache access; Trace caches 29

30 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 30

31 Multi-Level Caches What is important: faster caches or larger caches? Average memory access time = Hit time (L1) + Miss rate (L1) x Miss Penalty (L1) Miss penalty (L1) Hit time (L2) + Miss rate (L2) x Miss Penalty (L2) Can plug 2nd equation into the first: Average Hit memory access time = time(l1) + Miss rate(l1) x (Hit time(l2) + Miss rate(l2)x Miss penalty(l2)) 31

32 Multi-level Cache Terminology Local miss rate The miss rate of one hierarchy level by itself. # of misses at that level / # accesses to that level e.g. Miss rate(l1), Miss rate(l2) Global miss rate The miss rate of a whole group of hierarchy levels # of accesses coming out of that group (to lower levels) / # of accesses to that group Generally this is the product of the miss rates at each level in the group. Global L2 Miss rate = Miss rate(l1) Local Miss rate(l2) 32

33 Effect of 2-level Caching L2 size usually much bigger than L1 Provide reasonable hit rate Decreases May miss penalty of 1st-level cache increase L2 miss penalty Multiple-level cache inclusion property Inclusive cache: L1 is subset of L2; simplify cache coherence mechanism, effective cache size = L2 Exclusive cache: L1, L2 are exclusive; increase effect cache sizes = L1 + L2 Enforce inclusion property: Backward invalidation on L2 replacement 33

34 L2 Cache Performance Global cache miss rate is similar to the single cache miss rate Local miss rate is not a good measure of secondary caches 34

35 Early Restart, Critical Word First Early restart Don t wait for entire block to fill Resume CPU as soon as requested word is fetched Critical word first wrapped fetch, requested word first Fetch the requested word from memory first Resume CPU Then transfer the rest of the cache block Most beneficial if block size is large Commonly used in all the processors 35

36 Read Misses Take Priority Processor must wait on a read, not on a write Miss penalty is higher for reads to begin with and more benefit from reducing read miss penalty Write buffer can queue values to be written Until memory bus is not busy with reads Careful about the memory consistency issue What if we want to read a block in write buffer? Wait for write, then read block from memory Better: Read block out of write buffer. Dirty block replacement when reading Write old block, read new block - Delays the read. Old block to buffer, read new, write old. - Better! 36

37 Sub-block Placement Larger blocks have smaller tags (match faster) Smaller blocks have lower miss penalty Compromise solution: Use a large block size for tagging purposes Use a small block size for transfer purposes How? Valid bits associated with sub-blocks. Blocks Tags 37

38 Merging Write Buffer A mechanism to help reduce write stalls On a write to memory, block address and data to be written are placed in a write buffer. CPU can continue immediately Unless the write buffer is full. Write merging: If the same block is written again before it has been flushed to memory, old contents are replaced with new contents. Care must be taken to not violate memory consistency and proper write ordering 38

39 Write Merging Example 39

40 Victim Cache Small extra cache Holds blocks overflowing from the occasional overfull frame set. Very effective for reducing conflict misses. Can be checked in parallel with main cache Insignificant increase to hit time. 40

41 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 41

42 Three Types of Misses Compulsory During a program, the very first access to a block will not be in the cache (unless pre-fetched) Capacity The working set of blocks accessed by the program is too large to fit in the cache Conflict Unless cache is fully associative, sometimes blocks may be evicted too early because too many frequentlyaccessed blocks map to the same limited set of frames. 42

43 Misses by Type Conflict Conflict misses are significant in a direct-mapped cache. From direct-mapped to 2-way helps as much as doubling cache size. Going from direct-mapped to 4-way is better than doubling cache size. 43

44 As fraction of total misses 44

45 Larger Block Size Keep cache size & associativity constant Reduces compulsory misses Due to spatial locality More accesses are to a pre-fetched block Increases capacity misses More unused locations pulled into cache May increase conflict misses (slightly) Fewer sets may mean more blocks utilized per set Depends on pattern of addresses accessed Increases miss penalty - longer block transfers 45

46 Block Size Effect Miss rate is actually goes up if the block is too large relative to the cache size 46

47 Larger Caches Keep block size, set size, etc. constant No effect on compulsory misses. Block still won t be there on its 1st access! Reduces capacity misses More capacity! Reduces conflict misses (in general) Working blocks spread out over more frame sets Fewer blocks map to a set on average Less chance that the number of active blocks that map to a given set exceeds the set size. But, increases hit time! (And cost.) 47

48 Higher Associativity Keep cache size & block size constant Decreasing the number of sets No effect on compulsory misses No effect on capacity misses By definition, these are misses that would happen anyway in fully-associative Decreases conflict misses Blocks for in active set may not be evicted early set size smaller than capacity Can increase hit time (slightly) Direct-mapped is fastest n-way associative lookup a bit slower for larger n 48

49 Performance Comparison Assume: Hit Time1 way Cycle Time1 way Miss Penalty 25 Cycle Time1 way Hit Time 2 way 1.36 Cycle Time1 way Hit Time 4 way 1.44 Cycle Time1 way Hit Time 8 way 1.52 Cycle Time1 way 4KB, 1-way miss-rate=9.8%; 4-way miss-rate=7.1% Average Mem Access Time1 way 1.00 Miss Rate Average Mem AccessTime 4 way 1.44 Miss Rate

50 Higher Set-Associativity Cache Size 1-way 2-way 4-way 8-way 4KB KB KB KB KB KB KB KB Higher associativity increase the cycle time The table shows the average memory access time 1-way is better most of cases 50

51 Way Prediction Keep in each set a way-prediction information to predict the block in each set will be accessed next Only one tag may be matched at the first cycle; if miss, other blocks need to be examined Beneficial in two aspects Fast data access: Access the data without knowing the tag comparison results Low power: Only match a single tag; if majority of the prediction is correct Different systems use variations of the concept 51

52 Pseudo-Associative Caches Essentially this is 2-way set-associative, but with sequential (rather than parallel) lookups. Fast hit time if first frame checked is right. An occasional slow hit if an earlier conflict had moved the block to its backup location. 52

53 Pseudo-Associative Caches Placement: Place block b in frame (b mod n). Identification: Look for block b first in frame (b mod n), then in its secondary location ((b+n/2) mod n). (flip the most-significant bit) If found there, primary and secondary blocks are swapped. May maintain a MRU bit to reduce the search and for better replacement. Replacement: Block in frame (b mod n) is moved to secondary location ((b+n/2) mod n). (Block there is flushed.) Write strategy: Any desired write strategy can be used. 53

54 Compiler Optimizations Reorganize code to improve locality properties. The hardware designer s favorite solution. Requires no new hardware! Various techniques Cache awareness: Merging Arrays Loop Interchange Loop Fusion Blocking Other (in multidimensional arrays) source-source transformation technique 54

55 Loop Blocking Matrix Multiply Before: After: 55

56 Effect of Compiler Optimizations 56

57 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty via Parallelism Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 57

58 Non-blocking Caches Known as lockup-free cache, hit under miss While a miss is being processed, Allow other cache lookups to continue anyway Useful in dynamically scheduled CPUs Other instructions may be in the load queue Reduces effective miss penalty Useful CPU work fills the miss penalty delay slot hit under multiple miss, miss under miss: Extend technique to allow multiple misses to be queued up, while still processing new hits 58

59 Non-blocking Caches 59

60 Hardware Prefetching When memory is idle, speculatively get some blocks before the CPU first asks for them! Simple heuristic: Fetch 1 or more blocks that are consecutive to last one(s) fetched Often, the extra blocks are placed in a special stream buffer so as not to conflict with actual active blocks in the cache, otherwise the prefetch may pollute the cache Prefetching can reduce misses considerably Speculative fetches should be low-priority Use only otherwise-unused memory bandwidth Energy-inefficient (like all speculation) 60

61 Compiler-Controlled Prefetching Insert special instructions to load addresses from memory well before they are needed. Register vs. cache, faulting vs. nonfaulting Semantic invisibility, nonblocking-ness Can considerably reduce misses Can also cause extra conflict misses Replacing a block before it is completely used Can also delay valid accesses (tying up bus) Low-priority, can be pre-empted by real access 61

62 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 62

63 Small and Simple Caches Make cache smaller to improve hit time Or (probably better), add a new & smaller L0 cache between existing L1 cache and CPU. Keep L1 cache on same chip as CPU Physically close to functional units that access it Keep L1 design simple, e.g. direct-mapped Avoids Tag multiple tag comparisons can be compared after data cache fetch Reduces effective hit time 63

64 Access Time in a CMOS Cache 64

65 Avoid Address Translation In systems with virtual address spaces, virtual addr. must be mapped to physical addresses. If cache blocks are indexed/tagged w. physical addresses, we must do this translation before we can do the cache lookup. Long hit time! Solution: Access cache using the virtual address. Call this a Virtual Cache Drawback: Can Cache flush on context switch fix by tagging blocks with Process Ids (PIDs) Another problem: Aliasing, i.e. two virtual addresses mapped to same real address Fix with anti-aliasing or page coloring 65

66 Miss rate Benefit of PID Tags in Virtual Cache W/o PIDs, purge W. PIDs W/o context switching 66

67 Pipelined Cache Access Pipeline cache access so that Effective latency of first level cache hit can be multiple clock cycles Fast cycle time and slow hits Hit times: 1 for Pentium, 2 for P3, and 4 for P4 Increases number of pipeline stages Higher penalty on mispredicted branches More cycles from issue of load to use of data In reality Increases the bandwidth of instructions than decreasing the actual latency of a cache hit 67

68 Trace Caches Supply enough instructions per cycle without dependencies Finding ILP beyond 4 instructions per cycle Don t limit instructions in a static cache block to spatial locality Find dynamic sequence of instructions including taken branches NetBurst (P4) uses trace caches Addresses are no longer aligned. Same instruction is stored more than once If part of multiple traces 68

69 Summary of Cache Optimizations 69

70 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations for Improving Performance Memory Technology Virtual Memory Conclusion 70

71 Wider Main Memory 71

72 Simple Interleaved Memory Adjacent words found in different mem. banks Banks can be accessed in parallel Overlap latencies for accessing each word Can use narrow bus To return accessed words sequentially Fits well with sequential access e.g., of words in cache blocks 72

73 Independent Memory Banks Original motivation for memory banks Higher bandwidth by interleaving seq. accesses Allows multiple independent accesses Each bank requires separate address/data lines Non-blocking caches allow CPU to proceed beyond a cache miss Allows multiple simultaneous cache misses Possible only with memory banks 73

74 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 74

75 Main Memory Bandwidth: Bytes read or written per unit time Latency: Described by Access For Time: Delay between initiation/completion reads: Present address till result ready. Cycle time: Minimum interval between separate requests to memory. Address lines: Separate bus CPU Mem to carry addresses. RAS (Row Access Strobe) First half of address, sent first. CAS (Column Access Strobe) Second half of address, sent second. 75

76 RAS vs. CAS DRAM bit-cell array 1. RAS selects a row 2. Parallel readout of all row data 3. CAS selects a column to read 4. Selected bit written to memory bus 76

77 Typical DRAM Organization Low 14 bits (256 Mbit) High 14 bits 77

78 Types of Memory DRAM (Dynamic Random Access Memory) Cell design needs only 1 transistor per bit stored. Cell charges leak away and may dynamically (over time) drift from their initial levels. Requires periodic refreshing to correct drift e.g. Time every 8 ms spent refreshing kept to <5% of bandwidth SRAM (Static Random Access Memory) Cell voltages are statically (unchangingly) tied to power supply references. No drift, no refresh. But needs 4-6 transistors per bit. DRAM: 4-8x larger, 8-16x slower, 8-16x cheaper/bit 78

79 Amdahl/Case Rule Memory size (and I/O bandwidth) should grow linearly with CPU speed Typical: 1 MB main memory, 1 Mbps I/O bandwidth per 1 MIPS CPU performance. Takes a fairly constant ~8 seconds to scan entire memory (if memory bandwidth = I/O bandwidth, 4 bytes/load, 1 load/4 instructions, no latency problem) Moore s Law: DRAM size doubles every 18 months (up 60%/yr) Tracks processor speed improvements Unfortunately, DRAM latency has only decreased 7%/year. Latency is a big deal. 79

80 Some DRAM Trend Data Since 1998, the rate of increase in chip capacity has slowed to 2x per 2 years: * 128 Mb in 1998 * 256 Mb in 2000 * 512 Mb in

81 ROM and Flash ROM (Read-Only Memory) Nonvolatile + protection Flash Nonvolatile NVRAMs Reading Writing RAMs require no power to maintain state flash is near DRAM speeds is x slower than DRAM Frequently used for upgradeable embedded SW Used in Embedded Processors 81

82 DRAM Variations SDRAM Synchronous DRAM DRAM internal operation synchronized by a clock signal provided on the memory bus Double Data Rate (DDR) uses both clock edges RDRAM RAMBUS (Inc.) DRAM Proprietary DRAM interface technology on-chip interleaving / multi-bank technology a high-speed packet-switched (split-transaction) bus interface byte-wide interface, synchronous, dual-rate Licensed to many chip & CPU makers Higher bandwidth, costly than generic SDRAM DRDRAM Direct RDRAM (2nd ed. spec.) Separate row and column address/command buses Higher bandwidth (18-bit data, more banks, faster clock) 82

83 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 83

84 Virtual Memory The addition of the virtual memory mechanism complicated the cache access 84

85 Paging vs. Segmentation Paged Segment: Segment Each segment has integral number of pages for easy replacement and can still treat each segmentation as a unit 85

86 Four Important Questions Where to place a block in main memory? Operating systems takes care of it Replacement takes very long fully associative How to find a block in main memory? Page table is used Offset is concatenated when paging is used Offset is added when segmentation is used. Which block to replace when needed? Obviously LRU is used to minimize page faults What happens on a write? Magnetic disks takes millions of cycles to access. Always write back (use of dirty bit). 86

87 Addressing Virtual Memories 87

88 Fast Address Calculation Page tables are very large Kept in main memory Two memory accesses for one read or write Remember the last translation Reuse if the address is on the same page Exploit the principle of locality If access have locality, the address translations should also have locality Keep the address translations in a cache Translation lookaside buffer (TLB) The tag part stores the virtual address and the data part stores the page number. 88

89 TLB Example: Alpha Same as PID 89

90 An Memory Hierarchy Example 28 90

91 Protection of Virtual Memory Maintain two registers Base Bound For each address check base <= address <= bound Provide two modes User OS (kernel, supervisor, executive) 91

92 Alpha Virtual Addr. Mapping Supports both segmentation and paging 92

93 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 93

94 Design of Memory Hierarchies Superscalar CPU, number of ports to cache Speculative execution and memory system Combine inst. Cache with fetch and decode Caches in embedded systems! Real-time vs. power I/O and consistency of cached data The cache coherence problem 94

95 The Cache Coherency Problem 95

96 Alpha Memory Hierarchy 96

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Lecture 11. Virtual Memory Review: Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Lec 11 How to improve cache performance

Lec 11 How to improve cache performance Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,

More information

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III) COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!

More information

Introduction to cache memories

Introduction to cache memories Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

COSC 6385 Computer Architecture - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Memory Hierarchy Y. K. Malaiya

Memory Hierarchy Y. K. Malaiya Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath

More information

Memory Hierarchy and Caches

Memory Hierarchy and Caches Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline

More information

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Topics. Digital Systems Architecture EECE EECE Need More Cache? Digital Systems Architecture EECE 33-0 EECE 9-0 Need More Cache? Dr. William H. Robinson March, 00 http://eecs.vanderbilt.edu/courses/eece33/ Topics Cache: a safe place for hiding or storing things. Webster

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses

More information

Memory Technologies. Technology Trends

Memory Technologies. Technology Trends . 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

Cache Optimisation. sometime he thought that there must be a better way

Cache Optimisation. sometime he thought that there must be a better way Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions

More information

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory 1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

CMSC 611: Advanced Computer Architecture. Cache and Memory

CMSC 611: Advanced Computer Architecture. Cache and Memory CMSC 611: Advanced Computer Architecture Cache and Memory Classification of Cache Misses Compulsory The first access to a block is never in the cache. Also called cold start misses or first reference misses.

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 07: Caches II Shuai Wang Department of Computer Science and Technology Nanjing University 63 address 0 [63:6] block offset[5:0] Fully-AssociativeCache Keep blocks

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

CSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)

CSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson) CSE 4201 Memory Hierarchy Design Ch. 5 (Hennessy and Patterson) Memory Hierarchy We need huge amount of cheap and fast memory Memory is either fast or cheap; never both. Do as politicians do: fake it Give

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance 6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information