Memory Hierarchies 2009 DAT105
|
|
- Egbert Hall
- 5 years ago
- Views:
Transcription
1 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement techniques 1
2 Today s Topics Background: Repetition opt. lecture 2: Cache memories (C.1) A brief look on virtual memory (C.4) Improvement techniques: How cache performance affects processor performance [ ] A set of techniques that each address these main factors [C.3, 5.2] 2
3 Why is Caching Important? 1980: no cache in microprocessors level caches on a microprocessor chip level caches on a microprocessor chip Processor/memory speed gap is increasing 3
4 Faster Larger Levels of the Memory Hierarchy 100s of bytes 0,25 ns 100 KB 0,5 5 ns 1-10 GBytes 50 (2,5) ns 100s of GBytes 1-10 ms sec-min Registers Cache memory Main memory Magnetic disks Tape Instr. operands Blocks Pages Files Control program 1-8 bytes cache cntl bytes OS 4-16 Kbytes --- Mbytes Upper level Lower level 4
5 Cache Memory Implementation 31 Tag 14 Index 4 Offset 0 Wrd Byte Index selected Valid Tag Data (block=4 words) = & Hit Data (1 word) 5
6 Memory Block num Direct Mapped Caches Many memory blocks map to (compete for) the same cache location Block place Cache Tag check and data read (and bus transm.) can proceed in parallel! (Improves Hit Time ) 6
7 Two-way set associative cache 31 Tag 14 Index/Set 4 Offset 0 Wrd Byte Index Valid Tag Data (block=4 words) selected = & Index selected Valid Tag Data (block=4 words) = & =1 Hit Data (1 word) 7
8 Virtual Memory PROGRAM A Addr 0 A1 PROGRAM B Addr 0 B1 MAIN MEMORY Addr 0 1 A2 A3 A4 B2 B3 B4 Main memory is shared and uses a physical address space Addr Max B5 5 Each program uses its own virtual address space B6 Addr Max Addr Max Virtual Addresses Physical Addresses 2009 DAT105 8
9 Virtual Memory (VM) PROGRAM A Addr 0 A1 A2 A3 PROGRAM B Addr 0 B1 B2 B3 Virtual addresses get translated to physical addresses MAIN MEMORY Addr 0 A2 1 B5 2 A1 3 A4 Addr Max B4 B5 B6 Different translations for different programs B1 4 A3 5 Addr Max Addr Max Virtual Addresses Physical Addresses 9
10 Block called page Typically 4-16 KB Virtual Address Translation 31 Virtual page number Virtual (program) address Page offset 0 V D R r w x Physical page location Page table (PT) stored in Main memory Page Fault If the page is not in memory 27 Physical (main memory) address 0 Physical page number Page offset 2009 DAT105 10
11 The Pagetable (PT) Virtual to Physical address translation PT_Basereg : CPU register pointing to the current PT (quick process switches) Includes Protection bits per page (r w x) Includes status bits per page - V = Valid; 1= Page is in memory; 0 = Page is on disk - D = Dirty; 1= Page has been modified - R = Reference bit(s); Used by LRU replacement alg. 11
12 The Pagetable (PT) But each memory reference is now doubled! 1 st : Access the PT; 2 nd : Access the data Solution: use a TLB (Translation Lookaside Buffer): a small cache holding the most recent Virtual -> Physical address translations. Often separate ITLB & DTLB 12
13 Virtual Address Page No. offset tag index offset TLB VM Accesses Memory V D R TAG Physical Page No MISS HIT MISS Page Table V D R Physical Page no. / Disk position Disk
14 Exempel The FastMATH Processor. (Fys. Adr Cache) 14
15 System Hardware Overview (A Very Schematic Example) L1 ICache Interrupts CPU Control MMU TLB Memory bus Bus control System bus I/O control Network Register file ROB L1 DCache L2 Cache Main Memory I/O control Disk Branch predictor RS Memory access Common Data Bus Boot Timer etc. I/O control Graphics Fetch instruction Fetch queue Get operands & Issue RS Integer & Logic Write result I/O control Other I/O RS Floating point CPU System board Devices 15
16 Today s Topics Improvement techniques: How cache performance affects processor performance [ ] A set of techniques that each address these main factors [C.3, 5.2] 16
17 Cache Performance Metrics Average memory access time = Hit time Miss rate x Miss penalty Miss rate: fraction of memory accesses not found in a cache/memory level Sometimes useful to consider Hit rate = 1 - Miss rate Miss penalty: time to bring in a block, including time to replace a block. Measured in ns or number of clock cycles access time: average time to access data. transfer time: time to transfer block to a higher level 17
18 Cache Performance Impact of caches on execution time: #misses Exec time = IC x (CPI execution instruction x Miss Penalty) x T c CPI increase caused by cache misses Three ways to increase performance: 1. Reduce hit time 2. Reduce #misses/instruction Reduce #misses Reduce number of memory references per instruction 3. Reduce miss penalty 18
19 Technique Avoid virtual address translation Small and simple caches Way prediction Trace caches Pipelined cache access Nonblocking caches Banked caches Multi-level caches Prioritize read misses over writes Prioritize demanded data Merging write buffer Larger block size Bigger caches Higher associativity Compiler techniques Hardware controlled prefetching Compiler controlled prefetching Overview - improvements Hit time Complexity Bandwidth Miss penalty - Miss rate Comment Trivial; widely used Used in Pentium 4 Used in Pentium 4 Widely used Widely used Widely used With write through SW challenge! Mostly instructions Nonblocking needed 19
20 Hit-Time Improvement Techniques Average memory access time = Hit time Miss rate x Miss penalty Hit-time improvement techniques next 20
21 Simple often means Fast Smaller and simpler is faster: Multilevel cache. Simple and fast 1st level cache. A direct mapped cache allows tag check and data transmission to proceed in parallel. 21
22 Impact of Address Translation on Hit Time Physically indexed cache : P VA Translation PA hit Cache miss Main memory data 1. Virtually indexed cache (let virtual address index the cache in parallel!) : P VA Translation hit tag check Cache PA miss Main memory data 22
23 Impact of Address Translation on Hit Time 2. Use virtual addresses to both index cache and for tag check : VA P Cache miss Translation hit tag check PA Main memory data Only do TLB access on cache miss! Advantage: Removes the TLB from the critical path DAT105 23
24 Other Hit Time Improvement Techniques Trace caches Cache predicted sequences of instructions Used in for example Pentium 4 See pages 131 and Tag 14 Index/Set 4 Offset Wr d Byt e 0 Way prediction Faster if correct prediction If wrong, take an extra cycle Accuracy may be over 85%! Used in Pentium 4 Index Valid Tag Data (block=4 words) selected = & = 1 Hit Predicted way Index Valid Tag Data (block=4 words) selected = & Data (1 word) 24
25 Cache Bandwidth Improvements The number of memory operations that can be started per clock cycle is important to keep the processor running Hiding the miss penalty Improving hit time 25
26 Nonblocking Caches to Increase Cache Bandwidth Permits other cache operations to proceed when a miss has occurred (exploit parallelism!) Ratio to blocking cache The cache has to bookkeep all pending miss requests The presence of true data dependencies limits performance (as always) 26
27 Other Cache Bandwidth Improvements Pipelined Cache Access Each cache access takes several pipelined cycles For example: 1. Data and tag read 2. Tag check, word select 3. Block select and state update To processor Read tag mem Read data mem Tag check Word select Block select State update 27
28 Other Cache Bandwidth Improvements Multibanked Caches Divide cache blocks into banks that can be accessed simultaneously While one bank is accessed for possibly several cycles, next access can proceed if it goes to another bank Bank is selected based on block address 28
29 Miss-penalty Improvement Techniques Average memory access time = Hit time Miss rate x Miss penalty Miss penalty improvement techniques next 29
30 Using Cache Hierarchies to reduce Miss Penalty Considerations: 1st level cache can be made faster and smaller => on-chip 2nd level cache can be made larger to reduce capacity misses More cache levels P Cache L1 Cache L2 Main memory Performance of multi-level cache hierarchies Access time L1 = Hit L1 Miss rate L1 x Miss penalty L1 Miss penalty L1 = Hit L2 Miss rate L2 x Miss penalty L2 30
31 Prioritize Read Misses over Writes to Reduce Miss Penalty Write buffer (see appendix C) Holds writes to memory Let memory reads go first Need to check for RAW hazards against write buffer Perform writes when no reads 31
32 Prioritizing Demanded Data to Reduce Miss Penalty Sub-block placement tag 0 sub-block valid bit Early restart and critical word first Early restart restart processor as soon as the requested word has arrived Critical word first Fetch the requested word first Increases performance for large block sizes 32
33 Optimizing Write Accesses to Reduce Miss Penalty Merging write buffer Buffer not using merging Buffer using merging 33
34 Miss-rate Improvement Techniques Average memory access time = Hit time Miss rate x Miss penalty Miss-rate improvement techniques next 34
35 Classification of Cache Misses The three C model (Hill and Smith, 1987) : Compulsory (or cold) miss: The first reference to a block is always a miss. Capacity miss: If the space is not sufficient to host the data or code that have been accessed. Conflict miss: Two memory blocks may be mapped to the same cache block with a direct or set-associative address mapping even if there is still unused space in cache. 35
36 Basic Techniques to Reduce Misses Larger block size Uses spatial locality Reduces compulsory misses May increase conflict misses May increase miss penalty Bigger caches Reduces capacity misses May increase hit time Higher associativity Reduces conflict misses May increase hit time 31 Tag 14 Index/Set 4 Offset Wr d Byt e Index Valid Tag Data (block=4 words) selected = & = 1 Hit Index Valid Tag Data (block=4 words) selected = & Data (1 word) 0 36
37 Victim Cache: One Miss Reduction Technique Improving hit rate for directmapped caches Add buffer to place data discarded from cache Jouppi: a 4-entry victim cache removed 20%-95% of conflict misses for a 4 Kbyte direct mapped data cache Used in Alpha, and HP machines NB! Victim caches are discussed only briefly on page 301 of the book. 37
38 Prefetching Software prefetching load data before it is needed: Prefetching into registers Prefetching into caches Both require lockup-free caches Hardware prefetching If there is a miss for block X, fetch also block X1, X2,... Xd d=1 => one-block lookahead. Used in Alpha processors. 38
39 .key.val.key.val Compiler Optimizations to Eliminate Misses Increase locality! Increases Spatial Locality! 1. Merging arrays: Before: int key [SIZE]; After: struct merge { int key; int val [SIZE]; int val; } struct merge newarr[size]; key: newarr: val: 39
40 Compiler Optimizations to Eliminate Misses 2. Loop interchange: Memory: Before: After: for (col=0; col < N; col) for (row = 0; row < N; row) for (row = 0; row < N; row) for (col = 0; col < N; col) A [row, col] =... A [row, col] =... A[2,4] A[2,3] A[2,2] A[2,1] col -> A: row -> Increases Spatial Locality! A[1,4] A[1,3] A[1,2] A[1,1] 40
41 Compiler Optimizations to Eliminate Misses 3. Blocking. Example Matrix*Matrix. X = Y * Z. y 11 y 12 y 13 y 14 y 15 y 16 * z 11 z 12 z 13 z 14 z 15 z 16 y 21 y 22 y 23 y 31 y 32 y 33 y 41 y 42 y 43 y 24 y 25 y 26 y 34 y 35 y 36 y 44 y 45 y 46 z 21 z 22 z 23 z 24 z 25 z 26 z 31 z 32 z 33 z 34 z 35 z * 36 z 41 z 42 z 43 z 44 z 45 z 46 = y 51 y 52 y 53 y 54 y 55 y 56 z 51 z 52 z 53 z 54 z 55 z 56 y 61 y 62 y 63 y 64 y 65 y 66 z 61 z 62 z 63 z 64 z 65 z 66 x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 x 41 x 42 x 43 x 51 x 52 x 53 x 61 x 62 x 63 x 14 x 15 x 16 x 24 x 25 x 26 x 34 x 35 x 36 x 44 x 45 x 46 x 54 x 55 x 56 x 64 x 65 x 66 Before: for (i = 0; i < N; i) for (j = 0; j < N; j) { r = 0; for (k = 0; k < N; k) r = r y [ i ] [ k ] * z[ k ] [ j ]; x [ i ] [ j ] = r; }; DAT105 41
42 * DAT105 42
43 After Blocking: for (jj = 0; jj < N; jj = jj B) * Increases for (kk = 0; kk < N; kk = kk B) Spatial & Temporal for (i = 0; i < N; i) Locality! for (j = jj; j < min (jj B,N); j) { r = 0; for (k = kk; k < min (kk B,N); k) r = r y [ i ] [ k ] * z[ k ] [ j ]; x [ i ] [ j ] = x [ i ] [ j ] r; }; Number of memory access (worst case): Before blocking: 2N 3 N 2 After blocking: 2N 3 /B N 2 DAT105 43
44 Summary improvement techniques Technique Avoid virtual address translation Small and simple caches Way prediction Trace caches Pipelined cache access Nonblocking caches Banked caches Multi-level caches Prioritize read misses over writes Prioritize demanded data Merging write buffer Larger block size Bigger caches Higher associativity Compiler techniques Hardware controlled prefetching Compiler controlled prefetching Hit time Complexity Bandwidth Miss penalty - Miss rate Comment Trivial; widely used Used in Pentium 4 Used in Pentium 4 Widely used Widely used Widely used With write through SW challenge! Mostly instructions Nonblocking needed 44
Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate
Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationLec 11 How to improve cache performance
Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,
More informationCache performance Outline
Cache performance 1 Outline Metrics Performance characterization Cache optimization techniques 2 Page 1 Cache Performance metrics (1) Miss rate: Neglects cycle time implications Average memory access time
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationCache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time
Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationChapter-5 Memory Hierarchy Design
Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationImproving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is
More informationChapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.
Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!
More informationCache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time
Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +
More informationLecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time
Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B
More informationMemory Hierarchy 3 Cs and 6 Ways to Reduce Misses
Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Four Questions for Memory Hierarchy Designers
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationCOSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University
COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating
More informationMemory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache
Memory Cache Memory Locality cpu cache memory Memory hierarchies take advantage of memory locality. Memory locality is the principle that future memory accesses are near past accesses. Memory hierarchies
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache performance 4 Cache
More informationLecture 11. Virtual Memory Review: Memory Hierarchy
Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationCMSC 611: Advanced Computer Architecture. Cache and Memory
CMSC 611: Advanced Computer Architecture Cache and Memory Classification of Cache Misses Compulsory The first access to a block is never in the cache. Also called cold start misses or first reference misses.
More informationImproving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion
Improving Cache Performance Dr. Yitzhak Birk Electrical Engineering Department, Technion 1 Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory
More informationModern Computer Architecture
Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationECE468 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory ECE468 vm.1 Review: The Principle of Locality Probability of reference 0 Address Space 2 The Principle of Locality: Program access a relatively
More informationL2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary
HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationECE4680 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory If I can see it and I can touch it, it s real. If I can t see it but I can touch it, it s invisible. If I can see it but I can t touch it, it
More informationMEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming
MEMORY HIERARCHY DESIGN B649 Parallel Architectures and Programming Basic Optimizations Average memory access time = Hit time + Miss rate Miss penalty Larger block size to reduce miss rate Larger caches
More informationMemory Hierarchy Basics
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationLecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995
Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:
More informationVirtual memory why? Virtual memory parameters Compared to first-level cache Parameter First-level cache Virtual memory. Virtual memory concepts
Lecture 16 Virtual memory why? Virtual memory: Virtual memory concepts (5.10) Protection (5.11) The memory hierarchy of Alpha 21064 (5.13) Virtual address space proc 0? s space proc 1 Physical memory Virtual
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 12 Mahadevan Gomathisankaran March 4, 2010 03/04/2010 Lecture 12 CSCE 4610/5610 1 Discussion: Assignment 2 03/04/2010 Lecture 12 CSCE 4610/5610 2 Increasing Fetch
More informationClassification Steady-State Cache Misses: Techniques To Improve Cache Performance:
#1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory accesses/operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory
More informationLecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Who Cares About the Memory Hierarchy? Processor Only Thus
More informationCSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour
CSE 4 Computer Architecture Spring 25 Lectures 7 Virtual Memory Pramod V. Argade May 25, 25 Announcements Office Hour Monday, June 6th: 6:3-8 PM, AP&M 528 Instead of regular Monday office hour 5-6 PM Reading
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationTypes of Cache Misses: The Three C s
Types of Cache Misses: The Three C s 1 Compulsory: On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. 2 Capacity: Occur
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationPollard s Attempt to Explain Cache Memory
Pollard s Attempt to Explain Cache Start with (Very) Basic Block Diagram CPU (Actual work done here) (Starting and ending data stored here, along with program) Organization of : Designer s choice 1 Problem
More informationMEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming
MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address
More informationCPE 631 Lecture 06: Cache Design
Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Cache Performance How to Improve Cache Performance 0/0/004
More informationDECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations
DECstation 5 Miss Rates Cache Performance Measures % 3 5 5 5 KB KB KB 8 KB 6 KB 3 KB KB 8 KB Cache size Direct-mapped cache with 3-byte blocks Percentage of instruction references is 75% Instr. Cache Data
More informationAleksandar Milenkovich 1
Review: Caches Lecture 06: Cache Design Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville The Principle of Locality: Program access a relatively
More informationLecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin
Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 1999 Exam Average 76 90-100 4 80-89 3 70-79 3 60-69 5 < 60 1 Admin
More informationCPE 631 Lecture 04: CPU Caches
Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCOSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT
COSC4201 Chapter 4 Cache Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT 1 Memory Hierarchy The gap between CPU performance and main memory has been
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationCache Optimisation. sometime he thought that there must be a better way
Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationLecture 11 Reducing Cache Misses. Computer Architectures S
Lecture 11 Reducing Cache Misses Computer Architectures 521480S Reducing Misses Classifying Misses: 3 Cs Compulsory The first access to a block is not in the cache, so the block must be brought into the
More informationLecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"
Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3
More informationReducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW
More informationChapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)
Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)
More informationHandout 4 Memory Hierarchy
Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced
More informationTextbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:
Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationCS152 Computer Architecture and Engineering Lecture 18: Virtual Memory
CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory March 22, 1995 Dave Patterson (patterson@cs) and Shing Kong (shingkong@engsuncom) Slides available on http://httpcsberkeleyedu/~patterson
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationName: 1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: AMAT = Hit time + Miss rate * Miss penalty
1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: ( 3 Pts) AMAT Hit time + Miss rate * Miss penalty Name and explain (briefly) one technique for each of the
More informationCISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review
CISC 662 Graduate Computer Architecture Lecture 6 - Cache and virtual memory review Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (I)
COSC 6385 Computer Architecture - Memory Hierarchies (I) Edgar Gabriel Spring 2018 Some slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could
More informationMemory Hierarchy. Advanced Optimizations. Slides contents from:
Memory Hierarchy Advanced Optimizations Slides contents from: Hennessy & Patterson, 5ed. Appendix B and Chapter 2. David Wentzlaff, ELE 475 Computer Architecture. MJT, High Performance Computing, NPTEL.
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (I)
COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationLecture notes for CS Chapter 2, part 1 10/23/18
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More information