Memory Hierarchy Design
|
|
- Brian Blankenship
- 6 years ago
- Views:
Transcription
1 Memory Hierarchy Design
2 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 2
3 Many Levels in Memory Hierarchy Invisible only to high-level language programmers Usually made invisible to the programmer (even assembly programmers) Pipeline registers Register file 1st-level cache (on-chip) 2nd-level cache (on same MCM as CPU) Physical memory There can also be a 3rd (or more) cache levels here (usu. mounted on same board as CPU) Our focus in chapter 5 Virtual memory (on hard disk, often in same enclosure as CPU) Disk files (on hard disk often in same enclosure as CPU) Network-accessible disk files (often in the same building as the CPU) Tape backup/archive system (often in the same building as the CPU) Data warehouse: Robotically-accessed room full of shelves of tapes (usually on the same planet as the CPU) 3
4 Simple Hierarchy Example Note many orders of magnitude change in characteristics between levels: ,000 4
5 CPU vs. Memory Performance Trends Relative performance (vs perf.) as a function of year +55%/year +35%/year +7%/year 5
6 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 6
7 Cache Basics A cache is a (hardware managed) storage, intermediate in size, speed, and cost-per-bit between the programmer-visible registers and main physical memory. The cache itself may be SRAM or fast DRAM. There may be >1 levels of caches Basis for cache to work: Principle of Locality When a location is accessed, it and nearby locations are likely to be accessed again soon. Temporal locality - Same location likely again soon. Spatial locality - Nearby locations likely soon. 7
8 Four Basic Questions Consider levels in a memory hierarchy. Use block for the unit of data transfer; satisfy Principle of Locality. Locality Transfer between cache levels, and the memory The level design is described by four behaviors Block Placement: Where Block Identification: How Block is a block found if it is in the level? Replacement: Which Write could a new block be placed in the level? existing block should be replaced if necessary? Strategy: How are writes to the block handled? 8
9 Block Placement Schemes 9
10 Direct-Mapped Placement A block can only go into one frame in the cache Determined by block s address (in memory space) Frame number usually given by some low-order bits of block address. This can also be expressed as: (Frame number) = (Block address) mod (Number of frames (sets) in cache) Note that in a direct-mapped cache, block placement & replacement are both completely determined by the address of the new block that is to be accessed. 10
11 Direct-Mapped Identification Tags Block frames Address Tag Frm# Off. Decode & Row Select One Selected &Compared Compare Tags? Hit Mux select Data Word 11
12 Fully-Associative Placement One alternative to direct-mapped is: Allow block to fill any empty frame in the cache. How do we then locate the block later? Can associate each stored block with a tag Identifies the block s location in cache. When the block is needed, treat the cache as an associative memory, using the tag to match all frames in parallel, to pull out the appropriate block. Another alternative to direct-mapped is placement under full program control. A register file can be viewed as a small programmercontrolled cache (w. 1-word blocks). 12
13 Fully-Associative Identification Block addrs Block frames Address Block addr Off. Parallel Compare & Select Note that, compared to Direct: More address bits have to be stored with each block frame. A comparator is needed for each frame, to do the parallel associative lookup. Hit Mux select Data Word 13
14 Set-Associative Placement The block address determines not a single frame, but a frame set (several frames, grouped together). Frame set # = Block address mod # of frame sets The block can be placed associatively anywhere within that frame set. If there are n frames in each frame set, the scheme is called n-way set-associative. Direct mapped = 1-way set-associative. Fully associative: There is only 1 frame set. 14
15 Set-Associative Identification Tags Block frames Address Tag Set# Off. Note: 4 separate sets Set Select Parallel Compare within the Set Intermediate between directmapped and fully-associative in number of tag bits needed to be associated with cache frames. Still need a comparator for each frame (but only those in one set need be activated). Hit Mux select Data Word 15
16 Cache Size Equation Simple equation for the size of a cache: (Cache size) = (Block size) (Number of sets) (Set Associativity) Can relate to the size of various address fields: (Block size) = 2(# of offset bits) (Number (# of sets) = 2(# of index bits) of tag bits) = (# of memory address bits) address (#Memory of index bits) (# of offset bits) 16
17 Replacement Strategies Which block do we replace when a new block comes in (on cache miss)? Direct-mapped: There s only one choice! Associative (fully- or set-): If any frame in the set is empty, pick one of those. Otherwise, Random: there are many possible strategies: Simple, fast, and fairly effective Least-Recently Used (LRU), and approximations thereof Require bits to record replacement info., e.g. 4-way requires 4! = 24 permutations, need 5 bits to define the MRU to LRU positions FIFO: Replace the oldest block. 17
18 Write Strategies Most accesses are reads, not writes Especially if instruction reads are included Optimize for reads performance matters Direct mapped can return value before valid check Writes are more difficult Can t write to cache till we know the right block Object written may have various sizes (1-8 bytes) When to synchronize cache with memory? Write through - Write to cache and to memory Prone Write to stalls due to high bandwidth requirements back - Write to memory upon replacement Memory may be out of date 18
19 Another Write Strategy Maintain a FIFO queue (write buffer) of cache frames (e.g. can use a doubly-linked list) Meanwhile, take items from top of queue and write them to memory as fast as bus can handle Reads might take priority, or have a separate bus Advantages: Write stalls are minimized, while keeping memory as up-to-date as possible 19
20 Write Miss Strategies What do we do on a write to a block that s not in the cache? Two main strategies: Both do not stop processor Write-allocate (fetch on write) - Cache the block. No write-allocate (write around) - Just write to memory. Write-back caches tend to use write-allocate. White-through tends to use no-write-allocate. Use dirty bit to indicate write-back is needed in write-back strategy 20
21 Example Alpha KB, 2-way, 64-byte block, 512 sets 44 physical address bits 21
22 Instruction vs. Data Caches Instructions and data have different patterns of temporal and spatial locality Also instructions are generally read-only Can have separate instruction & data caches Advantages Doubles bandwidth between CPU & memory hierarchy Each cache can be optimized for its pattern of locality Disadvantages Slightly more complex design Can t dynamically adjust cache space taken up by instructions vs. data 22
23 I/D Split and Unified Caches Size I-Cache D-Cache Unified Cache 8KB KB KB KB KB KB Miss per 1000 accesses Much lower instruction miss rate than data miss rate 23
24 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 24
25 Cache Performance Equations Memory stalls per program (blocking cache): MemoryStal lcycle IC ( MemoryAccesses ) MissRate MissPenalt y Instructio n Misses MemoryStallCycle IC ( ) MissPenalty Instruction CPU time formula: CPU Time IC (CPI Execu Memory Stall Cycle ) Cycle Time Instruction More cache performance will be given later! 25
26 Cache Performance Example Ideal CPI=2.0, memory references / inst=1.5, cache size=64kb, miss penalty=75ns, hit time=1 clock cycle Compare performance of two caches: Direct-mapped (1-way): cycle time=1ns, miss rate=1.4% 2-way: cycle time=1.25ns, miss rate=1.0% Miss Penalty1 way Miss Penalty 2 way 75ns 75 Cycles 1ns 75ns 60 Cycles 1.25ns CPU Time1 way IC ( % ) 1ns IC CPU Time 2 way IC (2.0 1% ) 1.25ns IC 26
27 Out-Of-Order Processor Define new miss penalty considering overlap Mem. Stall Cycles Misses (Total miss latency Overlap miss latency) Instruction Instruction Compute memory latency and overlapped latency Example (from previous slide) Assume 30% of 75ns penalty can be overlapped, but with longer (1.25ns) cycle on 1-way design due 75ns 70% tomiss OOO Penalty 42 Cycles 1 way 1.25ns CPU Time1 way,ooo IC ( % ) 1.25ns 3.60 IC 27
28 Cache Performance Improvement Consider the cache performance equation: (Average memory access time) = (Hit time) + (Miss rate) (Miss penalty) Amortized miss penalty It obviously follows that there are three basic ways to improve cache performance: Reducing miss penalty (5.4) Reducing miss rate (5.5) Reducing miss penalty/rate via parallelism (5.6) Reducing hit time (5.7) Note that by Amdahl s Law, there will be diminishing returns from reducing only hit time or amortized miss penalty by itself, instead of both together. 28
29 Cache Performance Improvement Reduce miss penalty: Multilevel cache; Critical word first and early restart; priority to read miss; Merging write buffer; Victim cache Reduce miss rate: Larger block size; Increase cache size; Higher associativity; Way prediction and Pseudo-associative caches; Compiler optimizations; Reduce miss penalty/rate via parallelism: Non-blocking cache; Hardware prefetching; Compilercontrolled prefetching; Reduce hit time: Small simple cache; Avoid address translation in indexing cache; Pipelined cache access; Trace caches 29
30 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 30
31 Multi-Level Caches What is important: faster caches or larger caches? Average memory access time = Hit time (L1) + Miss rate (L1) x Miss Penalty (L1) Miss penalty (L1) Hit time (L2) + Miss rate (L2) x Miss Penalty (L2) Can plug 2nd equation into the first: Average Hit memory access time = time(l1) + Miss rate(l1) x (Hit time(l2) + Miss rate(l2)x Miss penalty(l2)) 31
32 Multi-level Cache Terminology Local miss rate The miss rate of one hierarchy level by itself. # of misses at that level / # accesses to that level e.g. Miss rate(l1), Miss rate(l2) Global miss rate The miss rate of a whole group of hierarchy levels # of accesses coming out of that group (to lower levels) / # of accesses to that group Generally this is the product of the miss rates at each level in the group. Global L2 Miss rate = Miss rate(l1) Local Miss rate(l2) 32
33 Effect of 2-level Caching L2 size usually much bigger than L1 Provide reasonable hit rate Decreases May miss penalty of 1st-level cache increase L2 miss penalty Multiple-level cache inclusion property Inclusive cache: L1 is subset of L2; simplify cache coherence mechanism, effective cache size = L2 Exclusive cache: L1, L2 are exclusive; increase effect cache sizes = L1 + L2 Enforce inclusion property: Backward invalidation on L2 replacement 33
34 L2 Cache Performance Global cache miss rate is similar to the single cache miss rate Local miss rate is not a good measure of secondary caches 34
35 Early Restart, Critical Word First Early restart Don t wait for entire block to fill Resume CPU as soon as requested word is fetched Critical word first wrapped fetch, requested word first Fetch the requested word from memory first Resume CPU Then transfer the rest of the cache block Most beneficial if block size is large Commonly used in all the processors 35
36 Read Misses Take Priority Processor must wait on a read, not on a write Miss penalty is higher for reads to begin with and more benefit from reducing read miss penalty Write buffer can queue values to be written Until memory bus is not busy with reads Careful about the memory consistency issue What if we want to read a block in write buffer? Wait for write, then read block from memory Better: Read block out of write buffer. Dirty block replacement when reading Write old block, read new block - Delays the read. Old block to buffer, read new, write old. - Better! 36
37 Sub-block Placement Larger blocks have smaller tags (match faster) Smaller blocks have lower miss penalty Compromise solution: Use a large block size for tagging purposes Use a small block size for transfer purposes How? Valid bits associated with sub-blocks. Blocks Tags 37
38 Merging Write Buffer A mechanism to help reduce write stalls On a write to memory, block address and data to be written are placed in a write buffer. CPU can continue immediately Unless the write buffer is full. Write merging: If the same block is written again before it has been flushed to memory, old contents are replaced with new contents. Care must be taken to not violate memory consistency and proper write ordering 38
39 Write Merging Example 39
40 Victim Cache Small extra cache Holds blocks overflowing from the occasional overfull frame set. Very effective for reducing conflict misses. Can be checked in parallel with main cache Insignificant increase to hit time. 40
41 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 41
42 Three Types of Misses Compulsory During a program, the very first access to a block will not be in the cache (unless pre-fetched) Capacity The working set of blocks accessed by the program is too large to fit in the cache Conflict Unless cache is fully associative, sometimes blocks may be evicted too early because too many frequentlyaccessed blocks map to the same limited set of frames. 42
43 Misses by Type Conflict Conflict misses are significant in a direct-mapped cache. From direct-mapped to 2-way helps as much as doubling cache size. Going from direct-mapped to 4-way is better than doubling cache size. 43
44 As fraction of total misses 44
45 Larger Block Size Keep cache size & associativity constant Reduces compulsory misses Due to spatial locality More accesses are to a pre-fetched block Increases capacity misses More unused locations pulled into cache May increase conflict misses (slightly) Fewer sets may mean more blocks utilized per set Depends on pattern of addresses accessed Increases miss penalty - longer block transfers 45
46 Block Size Effect Miss rate is actually goes up if the block is too large relative to the cache size 46
47 Larger Caches Keep block size, set size, etc. constant No effect on compulsory misses. Block still won t be there on its 1st access! Reduces capacity misses More capacity! Reduces conflict misses (in general) Working blocks spread out over more frame sets Fewer blocks map to a set on average Less chance that the number of active blocks that map to a given set exceeds the set size. But, increases hit time! (And cost.) 47
48 Higher Associativity Keep cache size & block size constant Decreasing the number of sets No effect on compulsory misses No effect on capacity misses By definition, these are misses that would happen anyway in fully-associative Decreases conflict misses Blocks for in active set may not be evicted early set size smaller than capacity Can increase hit time (slightly) Direct-mapped is fastest n-way associative lookup a bit slower for larger n 48
49 Performance Comparison Assume: Hit Time1 way Cycle Time1 way Miss Penalty 25 Cycle Time1 way Hit Time 2 way 1.36 Cycle Time1 way Hit Time 4 way 1.44 Cycle Time1 way Hit Time 8 way 1.52 Cycle Time1 way 4KB, 1-way miss-rate=9.8%; 4-way miss-rate=7.1% Average Mem Access Time1 way 1.00 Miss Rate Average Mem AccessTime 4 way 1.44 Miss Rate
50 Higher Set-Associativity Cache Size 1-way 2-way 4-way 8-way 4KB KB KB KB KB KB KB KB Higher associativity increase the cycle time The table shows the average memory access time 1-way is better most of cases 50
51 Way Prediction Keep in each set a way-prediction information to predict the block in each set will be accessed next Only one tag may be matched at the first cycle; if miss, other blocks need to be examined Beneficial in two aspects Fast data access: Access the data without knowing the tag comparison results Low power: Only match a single tag; if majority of the prediction is correct Different systems use variations of the concept 51
52 Pseudo-Associative Caches Essentially this is 2-way set-associative, but with sequential (rather than parallel) lookups. Fast hit time if first frame checked is right. An occasional slow hit if an earlier conflict had moved the block to its backup location. 52
53 Pseudo-Associative Caches Placement: Place block b in frame (b mod n). Identification: Look for block b first in frame (b mod n), then in its secondary location ((b+n/2) mod n). (flip the most-significant bit) If found there, primary and secondary blocks are swapped. May maintain a MRU bit to reduce the search and for better replacement. Replacement: Block in frame (b mod n) is moved to secondary location ((b+n/2) mod n). (Block there is flushed.) Write strategy: Any desired write strategy can be used. 53
54 Compiler Optimizations Reorganize code to improve locality properties. The hardware designer s favorite solution. Requires no new hardware! Various techniques Cache awareness: Merging Arrays Loop Interchange Loop Fusion Blocking Other (in multidimensional arrays) source-source transformation technique 54
55 Loop Blocking Matrix Multiply Before: After: 55
56 Effect of Compiler Optimizations 56
57 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty via Parallelism Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 57
58 Non-blocking Caches Known as lockup-free cache, hit under miss While a miss is being processed, Allow other cache lookups to continue anyway Useful in dynamically scheduled CPUs Other instructions may be in the load queue Reduces effective miss penalty Useful CPU work fills the miss penalty delay slot hit under multiple miss, miss under miss: Extend technique to allow multiple misses to be queued up, while still processing new hits 58
59 Non-blocking Caches 59
60 Hardware Prefetching When memory is idle, speculatively get some blocks before the CPU first asks for them! Simple heuristic: Fetch 1 or more blocks that are consecutive to last one(s) fetched Often, the extra blocks are placed in a special stream buffer so as not to conflict with actual active blocks in the cache, otherwise the prefetch may pollute the cache Prefetching can reduce misses considerably Speculative fetches should be low-priority Use only otherwise-unused memory bandwidth Energy-inefficient (like all speculation) 60
61 Compiler-Controlled Prefetching Insert special instructions to load addresses from memory well before they are needed. Register vs. cache, faulting vs. nonfaulting Semantic invisibility, nonblocking-ness Can considerably reduce misses Can also cause extra conflict misses Replacing a block before it is completely used Can also delay valid accesses (tying up bus) Low-priority, can be pre-empted by real access 61
62 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 62
63 Small and Simple Caches Make cache smaller to improve hit time Or (probably better), add a new & smaller L0 cache between existing L1 cache and CPU. Keep L1 cache on same chip as CPU Physically close to functional units that access it Keep L1 design simple, e.g. direct-mapped Avoids Tag multiple tag comparisons can be compared after data cache fetch Reduces effective hit time 63
64 Access Time in a CMOS Cache 64
65 Avoid Address Translation In systems with virtual address spaces, virtual addr. must be mapped to physical addresses. If cache blocks are indexed/tagged w. physical addresses, we must do this translation before we can do the cache lookup. Long hit time! Solution: Access cache using the virtual address. Call this a Virtual Cache Drawback: Can Cache flush on context switch fix by tagging blocks with Process Ids (PIDs) Another problem: Aliasing, i.e. two virtual addresses mapped to same real address Fix with anti-aliasing or page coloring 65
66 Miss rate Benefit of PID Tags in Virtual Cache W/o PIDs, purge W. PIDs W/o context switching 66
67 Pipelined Cache Access Pipeline cache access so that Effective latency of first level cache hit can be multiple clock cycles Fast cycle time and slow hits Hit times: 1 for Pentium, 2 for P3, and 4 for P4 Increases number of pipeline stages Higher penalty on mispredicted branches More cycles from issue of load to use of data In reality Increases the bandwidth of instructions than decreasing the actual latency of a cache hit 67
68 Trace Caches Supply enough instructions per cycle without dependencies Finding ILP beyond 4 instructions per cycle Don t limit instructions in a static cache block to spatial locality Find dynamic sequence of instructions including taken branches NetBurst (P4) uses trace caches Addresses are no longer aligned. Same instruction is stored more than once If part of multiple traces 68
69 Summary of Cache Optimizations 69
70 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations for Improving Performance Memory Technology Virtual Memory Conclusion 70
71 Wider Main Memory 71
72 Simple Interleaved Memory Adjacent words found in different mem. banks Banks can be accessed in parallel Overlap latencies for accessing each word Can use narrow bus To return accessed words sequentially Fits well with sequential access e.g., of words in cache blocks 72
73 Independent Memory Banks Original motivation for memory banks Higher bandwidth by interleaving seq. accesses Allows multiple independent accesses Each bank requires separate address/data lines Non-blocking caches allow CPU to proceed beyond a cache miss Allows multiple simultaneous cache misses Possible only with memory banks 73
74 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 74
75 Main Memory Bandwidth: Bytes read or written per unit time Latency: Described by Access For Time: Delay between initiation/completion reads: Present address till result ready. Cycle time: Minimum interval between separate requests to memory. Address lines: Separate bus CPU Mem to carry addresses. RAS (Row Access Strobe) First half of address, sent first. CAS (Column Access Strobe) Second half of address, sent second. 75
76 RAS vs. CAS DRAM bit-cell array 1. RAS selects a row 2. Parallel readout of all row data 3. CAS selects a column to read 4. Selected bit written to memory bus 76
77 Typical DRAM Organization Low 14 bits (256 Mbit) High 14 bits 77
78 Types of Memory DRAM (Dynamic Random Access Memory) Cell design needs only 1 transistor per bit stored. Cell charges leak away and may dynamically (over time) drift from their initial levels. Requires periodic refreshing to correct drift e.g. Time every 8 ms spent refreshing kept to <5% of bandwidth SRAM (Static Random Access Memory) Cell voltages are statically (unchangingly) tied to power supply references. No drift, no refresh. But needs 4-6 transistors per bit. DRAM: 4-8x larger, 8-16x slower, 8-16x cheaper/bit 78
79 Amdahl/Case Rule Memory size (and I/O bandwidth) should grow linearly with CPU speed Typical: 1 MB main memory, 1 Mbps I/O bandwidth per 1 MIPS CPU performance. Takes a fairly constant ~8 seconds to scan entire memory (if memory bandwidth = I/O bandwidth, 4 bytes/load, 1 load/4 instructions, no latency problem) Moore s Law: DRAM size doubles every 18 months (up 60%/yr) Tracks processor speed improvements Unfortunately, DRAM latency has only decreased 7%/year. Latency is a big deal. 79
80 Some DRAM Trend Data Since 1998, the rate of increase in chip capacity has slowed to 2x per 2 years: * 128 Mb in 1998 * 256 Mb in 2000 * 512 Mb in
81 ROM and Flash ROM (Read-Only Memory) Nonvolatile + protection Flash Nonvolatile NVRAMs Reading Writing RAMs require no power to maintain state flash is near DRAM speeds is x slower than DRAM Frequently used for upgradeable embedded SW Used in Embedded Processors 81
82 DRAM Variations SDRAM Synchronous DRAM DRAM internal operation synchronized by a clock signal provided on the memory bus Double Data Rate (DDR) uses both clock edges RDRAM RAMBUS (Inc.) DRAM Proprietary DRAM interface technology on-chip interleaving / multi-bank technology a high-speed packet-switched (split-transaction) bus interface byte-wide interface, synchronous, dual-rate Licensed to many chip & CPU makers Higher bandwidth, costly than generic SDRAM DRDRAM Direct RDRAM (2nd ed. spec.) Separate row and column address/command buses Higher bandwidth (18-bit data, more banks, faster clock) 82
83 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 83
84 Virtual Memory The addition of the virtual memory mechanism complicated the cache access 84
85 Paging vs. Segmentation Paged Segment: Segment Each segment has integral number of pages for easy replacement and can still treat each segmentation as a unit 85
86 Four Important Questions Where to place a block in main memory? Operating systems takes care of it Replacement takes very long fully associative How to find a block in main memory? Page table is used Offset is concatenated when paging is used Offset is added when segmentation is used. Which block to replace when needed? Obviously LRU is used to minimize page faults What happens on a write? Magnetic disks takes millions of cycles to access. Always write back (use of dirty bit). 86
87 Addressing Virtual Memories 87
88 Fast Address Calculation Page tables are very large Kept in main memory Two memory accesses for one read or write Remember the last translation Reuse if the address is on the same page Exploit the principle of locality If access have locality, the address translations should also have locality Keep the address translations in a cache Translation lookaside buffer (TLB) The tag part stores the virtual address and the data part stores the page number. 88
89 TLB Example: Alpha Same as PID 89
90 An Memory Hierarchy Example 28 90
91 Protection of Virtual Memory Maintain two registers Base Bound For each address check base <= address <= bound Provide two modes User OS (kernel, supervisor, executive) 91
92 Alpha Virtual Addr. Mapping Supports both segmentation and paging 92
93 Outline Introduction Cache Basics Cache Performance Reducing Cache Miss Penalty Reducing Cache Miss Rate Reducing Hit Time Main Memory and Organizations Memory Technology Virtual Memory Conclusion 93
94 Design of Memory Hierarchies Superscalar CPU, number of ports to cache Speculative execution and memory system Combine inst. Cache with fetch and decode Caches in embedded systems! Real-time vs. power I/O and consistency of cached data The cache coherence problem 94
95 The Cache Coherency Problem 95
96 Alpha Memory Hierarchy 96
Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationThe levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms
The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationLecture 11. Virtual Memory Review: Memory Hierarchy
Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationLec 11 How to improve cache performance
Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationMEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming
MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationOutline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate
Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCOSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University
COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationCENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationChapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.
Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!
More informationIntroduction to cache memories
Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationMemory Hierarchy Y. K. Malaiya
Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath
More informationMemory Hierarchy and Caches
Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline
More informationTopics. Digital Systems Architecture EECE EECE Need More Cache?
Digital Systems Architecture EECE 33-0 EECE 9-0 Need More Cache? Dr. William H. Robinson March, 00 http://eecs.vanderbilt.edu/courses/eece33/ Topics Cache: a safe place for hiding or storing things. Webster
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationChapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)
Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationMemory systems. Memory technology. Memory technology Memory hierarchy Virtual memory
Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationMemory Hierarchies 2009 DAT105
Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationMemory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy
ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses
More informationMemory Technologies. Technology Trends
. 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM
More informationTextbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:
Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331
More informationModern Computer Architecture
Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap
More informationCache Optimisation. sometime he thought that there must be a better way
Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching
More informationCENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationCOEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory
1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationLecture notes for CS Chapter 2, part 1 10/23/18
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationCMSC 611: Advanced Computer Architecture. Cache and Memory
CMSC 611: Advanced Computer Architecture Cache and Memory Classification of Cache Misses Compulsory The first access to a block is never in the cache. Also called cold start misses or first reference misses.
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 07: Caches II Shuai Wang Department of Computer Science and Technology Nanjing University 63 address 0 [63:6] block offset[5:0] Fully-AssociativeCache Keep blocks
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationCSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)
CSE 4201 Memory Hierarchy Design Ch. 5 (Hennessy and Patterson) Memory Hierarchy We need huge amount of cheap and fast memory Memory is either fast or cheap; never both. Do as politicians do: fake it Give
More informationThe Memory Hierarchy & Cache
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory
More informationCache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationCS 136: Advanced Architecture. Review of Caches
1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationMain Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be
More informationThe Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):
The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:
More informationLocality. Cache. Direct Mapped Cache. Direct Mapped Cache
Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be
More information