Cache Designs and Tricks. Kyle Eli, Chun-Lung Lim
|
|
- Archibald Ryan
- 5 years ago
- Views:
Transcription
1 Cache Designs and Tricks Kyle Eli, Chun-Lung Lim
2 Why is cache important? CPUs already perform computations on data faster than the data can be retrieved from main memory and microprocessor execution speeds are increasing faster than DRAM access times. Cache is typically much faster than main memory. Multiple caches, each specialized to enhance a different aspect of program execution.
3 What is Cache? A cache e is collection of data duplicating original values stored or computed earlier. Implemented on- or off-chip in SRAM. Faster to fetch or compute relative to original values. Low latency High bandwidth Commonly organized into two or three levels.
4 Diagram of a CPU memory cache
5 Cache Associativity Direct-Mapped Cache Fully Associative Cache Set Associative Cache
6 Direct-Mapped Cache Slots are treated as a large array, with index chosen using the bits of the address. Suffers from many collisions, causing the cache line to be repeatedly evicted even when there are many empty slots. Very simple, only one slot to check.
7 Fully Associative Cache Any slot can store the cache line. Obtains data by comparing tag bits of the address to tag bits of every slot and making sure the valid bit is set. Hardware is complex. Normally used in translation lookaside buffers.
8 Set Associative Cache Combination of fully-associative and direct-mapped schemes. Cache slots are grouped into sets. Finding a set is like direct-mapped scheme. Finding slot within the set is like the fully-associative scheme. Comparison hardware only needed for finding sets. Fewer collisions because you have more slots to choose from, even when cache lines map to the same set.
9 Multi-level Cache There are three levels of cache commonly being used: One on-chip with the processor, referred to as the "Level-1" cache (L1) or primary cache. Another is on-die cache is the "Level 2" cache (L2) or secondary cache. L3 Cache, generally much larger and implemented on a separate chip.
10 Multi-level Caches new design Inclusive caches Data in the L1 cache may also be in the L2 cache. Example : Intel Pentium II, III, IV and most RISCs. Exclusive caches decisions Data is guaranteed to be in at most of the L1 and L2 caches. Example : AMD Athlon
11 Athlon64 Cache Hierarchy
12 Cache Issues Latency: time for cache to respond to a request. Smaller caches typically respond faster. Bandwidth: number of bytes which can be read or written per second. Cost: expensive to implement. A large level 3 cache can generally cost in excess of $1000 to implement. Benefits depends on the application s s access patterns.
13 Cache Issues (continued) Memory requests are satisfied from Cache Cache Hit Occurs when the processor requests an address stored in the cache. Processor writes or reads directly to or from cache. Main Memory Cache Miss Occurs when the processor requests an address that is not stored in the cache.
14 Caching Algorithm Caching algorithms are used to optimize cache management. Cache size is limited. Algorithm used to decide which items to keep and which to discard to make room for new items. Cache algorithms: Least Recently Used (LRU) Least Frequently Used Belady s Min
15 Least Recently Used Discards the least recently used item first. Must keep track of least-recently used item. Using Pseudo-LRU, only one bit per cache item required to work.
16 Least Frequently Used Counts how often an item is used. Items used the least are discarded first.
17 Belady s Min Optimal algorithm, discard information that will not be needed for the longest time in the future. Can not be implemented in hardware as it requires future knowledge. Used in simulations to judge effectiveness of other algorithms.
18 Cache Optimization Locality Spatial Locality Requested data is physically near previously used data. Temporal Locality Requested data was recently used, or frequently re-used.
19 Optimization for Spatial Locality Spatial locality refers to accesses close to one another in position. Spatial locality is important to the caching system because contiguous cache lines are loaded from memory when the first piece of that line is loaded. Subsequent accesses within the same cache line are then practically free until the line is flushed from the cache. Spatial locality is not only an issue in the cache, but also within most main memory systems.
20 Optimization for Spatial Locality Prefetch data in other cache lines.
21 Optimization for Temporal Locality Temporal locality refers to 2 accesses to a piece of memory within a small period of time. The shorter the time between the first and last access to a memory location the less likely it will be loaded from main memory or slower caches multiple times.
22 Optimization for Temporal Locality Re-use data which has been brought to cache as often as possible.
23 Optimization Techniques Prefetching Loop blocking Loop fusion Array padding Array merging
24 Prefetching Many architectures include a prefetch instruction that is a hint to the processor that a value will be needed from memory soon. When the memory access pattern is well defined and the programmer knows many instructions ahead of time, prefetching will result in very fast access when the data is needed.
25 Prefetching (continued) It does no good to prefetch variables that will only be written to. The prefetch should be done as early as possible. Getting values from memory takes a LONG time. Prefetching too early, however will mean that other accesses might flush the prefetched data from the cache. Memory accesses may take 50 processor clock cycles or more.
26 Prefetching (continued) The compiler may be inserting prefetch instructions. May be slower than manual prefetch. The CPU probably has a hardware prefetching feature. Can be dynamically driven by run-time data. Independent of manual prefetch.
27 Loop Blocking Reorder loop iteration so as to operate on all the data in a cache line at once, so it needs only to be brought in from memory once. For instance if an algorithm calls for iterating down the columns of an array in a row-major language, do multiple columns at a time. The number of columns should be chosen to equal a cache line.
28 Loop Fusion Combine loops that access the same data. Leads to a single load of each memory address.
29 Array Padding Arrange accesses to avoid subsequent access to different data that may be cached in the same position.
30 Array Merging Merge arrays so that data that needs to be accessed at once is stored together
31 Pitfalls and Gotchas Basically, the pitfalls of memory access patterns are the inverse of the strategies for optimization. There are also some gotchas that are unrelated to these techniques. The associativity of the cache. Shared memory. Sometimes an algorithm is just not cache friendly.
32 Problems From Associativity When this problem shows itself is highly dependent on the cache hardware being used. It does not exist in fully associative caches. The simplest case to explain is a direct-mapped cache. If the stride between addresses is a multiple of the cache size, only one cache position will be used.
33 Shared Memory It is obvious that shared memory with high contention cannot be effectively cached. However it is not so obvious that unshared memory that is close to memory accessed by another processor is also problematic. When laying out data, complete cache lines should be considered a single location and should not be shared.
34 Optimization Wrapup Only try after the best algorithm has been selected. Cache optimizations will not result in an asymptotic speedup. If the problem is too large to fit in memory or in memory local to a compute node, many of these techniques may be applied to speed up accesses to even more remote storage.
35 Recent Cache Architecture AMD Athlon 64 X2 128kB 2-way set associative L1 (64kB data, 64kB instruction) per core 1MB or 512kB full-speed 16-way set associative L2 cache per core Intel Core (Yonah( Yonah) 64kB L1 (32kB data, 32kB instruction) per core 2MB full-speed 8-way set associative L2 cache, shared Designed for power-saving, cache can be flushed to memory and cache ways can be deactivated.
36 Recent Cache Architecture SUN UltraSparc T1 24kB 4-way set associative L1 (8kB data, 16kB instruction) per core 3072kB full-speed 12-way set associative L2 cache, shared IBM Power5 96kB L1 (64kB 2-way set associative instruction, 32kB 4-way set associative data) 1.92MB full-speed 10-way set associative L2 cache, shared 36MB half-speed 12-way set associative L3 cache, shared (off-die)
37 Recent Cache Architecture Sony/Toshiba/IBM Cell Broadband Engine 9 cores 1 POWER Processing Element (PPE) 64kB L1(32kb 2-way set associative instruction, 32kb 4-way set associative data) 512kB full-speed 8-way set associative L2 8 Synergistic Processing Elements (SPEs( SPEs) 256kB Local Storage per core No direct access to memory Can access any 128-bit word at L1 speed from local storage
38 Specialized Cache Designs CAM-Tag Cache for Low-Power
39 Motivation Cache uses 30-60% processor energy in embedded systems. Example: 43% for StrongArm-1 Many Industrial Low-Power Processors use CAM (content-addressable-memory) ARM3 64-way set-associative [Furber et. al. 89] StrongArm 32-way set-associative [Santhanam et. al. 98] Intel XScale 32-way set-associative 01 CAM: Fast and Energy-Efficient
40 Set-Associative RAM-tag Cache Tag Status Data Tag Status Data =? =? Not energy- efficient All ways are read out Two-phase approach More energy- efficient 2X latency Tag Index Offset
41 Set-Associative RAM-tag Sub-bank BUS Cache Not energy-efficient All ways are read out Two-phase approach gwl More energy-efficient 2X latency Tag SRAM Cells Address Decoder lwl Data SRAM Cells lwl Data SRAM Cells Sub-banking 1 sub-bank = 1 way Low-swing Bitlines Only for reads, writes performed full-swing Tag Comp Offset Dec. Sense Amps Offset Dec. Sense Amps Wordline Gating addr offset offset I/O BUS
42 CAM-tag Cache Tag Status Data Tag Status Data HIT? HIT? Only one sub- bank activated Associativity within sub-bank Easy to implement high associativity Word Tag Bank Offset
43 CAM-tag Array CAM-tag Cache Sub-bank gwl Offset Dec. lwl 32 SRAM Cells Sense Amps 128 Offset Dec. lwl SRAM Cells Sense Amps Only one subbank activated Associativity within sub-bank Easy to implement high associativity tag offset I/O offset BUS
44 CAM-tag Cache Sub-bank Layout 1-KB Cache Sub-bank implemented in 0.25 µm CMOS technology 2x12x32 CAM Array 32x64 RAM Array 10% area overhead over RAM-tag cache
45 RAM tag Cache Critical Path: Index Bits Delay Comparison Global Wordline Decoding gwl Local Wordline Decoding lwl Data out Decoded offset Tag Comp. Tag bits Tag readout Data readout CAM tag Cache Critical Path: Tag bits Tag bits broadcasting Tag bits gwl Local Wordline Decoding lwl Tag Comp. Decoded offset Data out Within 3% of each other Data readout
46 Hit Energy Comparison Hit Energy per Access for 8KB Cache in pj way RAM 2-way RAM 4-way RAM 8-way RAM 8-way CAM 16-way CAM 32-way CAM LZW ijpeg pegwit perl m88ksim gcc Avg Associativity and Implementation
47 Total Access Energy (pegwit) Pegwit High miss rate for high associativity Total Energy per Access for 8KB Cache in pj X 64X 128X 256X 512X 1024X 1-RAM 2-RAM 4-RAM 8-RAM 8-CAM 16-CAM 32-CAM Miss Energy Expressed in Multiples of 32-bit Read Access Energy
48 Total Access Energy (perl) Perl Very low miss rate for high associativity Total Energy per Access for 8KB Cache in pj X 64X 128X 256X 512X 1024X 1-RAM 2-RAM 4-RAM 8-RAM 8-CAM 16-CAM 32-CAM Miss Energy Expressed in Multiples of 32-bit Read Access Energy
49 References Wikipedia UMD ndex.html Michael Zhang and Krste Asanovic Highly-Associative Caches for Low Power Processors, MIT Laboratory for Computer Science, December 2000 (from Kool Chips Workshop) Cache Designs and Tricks Kevin Leung, Josh Gilkerson,, Albert Kalim, Shaz Husain
50 References Cont d Many academic studies on cache [Albera, Bahar, 98] Power and performance trade-offs [Amrutur,, Horowitz, 98, 98, 00] Speed and power scaling [Bellas,, Hajj, Polychronopoulos, 99] Dynamic cache management [Ghose,, Kamble, 99] Power reduction through sub-banking, etc. [Inoue, Ishihara, Murakami, 99] Way predicting set-associative cache [Kin,Gupta, Mangione-Smith, 97] Filter cache [Ko, Balsara, Nanda, 98] Multilevel caches for RISC and CISC [Wilton, Jouppi, 94] CACTI cache model
CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationImproving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches
Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Highly-Associative
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationLecture notes for CS Chapter 2, part 1 10/23/18
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationCache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationPower Reduction Techniques in the Memory System. Typical Memory Hierarchy
Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationMemory systems. Memory technology. Memory technology Memory hierarchy Virtual memory
Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationCache Design and Tricks. Presenters: Kevin Leung Josh Gilkerson Albert Kalim Shaz Husain
Cache Design and Tricks Presenters: Kevin Leung Josh Gilkerson Albert Kalim Shaz Husain What is Cache? A cache is simply a copy of a small data segment residing in the main memory Fast but small extra
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationOutline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate
Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationIntroduction to OpenMP. Lecture 10: Caches
Introduction to OpenMP Lecture 10: Caches Overview Why caches are needed How caches work Cache design and performance. The memory speed gap Moore s Law: processors speed doubles every 18 months. True for
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationCray XE6 Performance Workshop
Cray XE6 Performance Workshop Mark Bull David Henty EPCC, University of Edinburgh Overview Why caches are needed How caches work Cache design and performance. 2 1 The memory speed gap Moore s Law: processors
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationA Review on Cache Memory with Multiprocessor System
A Review on Cache Memory with Multiprocessor System Chirag R. Patel 1, Rajesh H. Davda 2 1,2 Computer Engineering Department, C. U. Shah College of Engineering & Technology, Wadhwan (Gujarat) Abstract
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationCache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory
Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More informationLecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationCache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.
Cache Memories From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging
More informationImproving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is
More information1/19/2009. Data Locality. Exploiting Locality: Caches
Spring 2009 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic Data Locality Temporal: if data item needed now, it is likely to be needed again in near future Spatial: if data item needed now, nearby
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationLecture 2: Memory Systems
Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer
More informationCache Optimisation. sometime he thought that there must be a better way
Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationLec 11 How to improve cache performance
Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationLecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)
Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 More Cache Basics caches are split as instruction and data; L2 and L3 are unified The /L2 hierarchy can be inclusive,
More informationWilliam Stallings Computer Organization and Architecture 8th Edition. Cache Memory
William Stallings Computer Organization and Architecture 8th Edition Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics
More informationAdvanced Computer Architecture
Advanced Computer Architecture 1 L E C T U R E 2: M E M O R Y M E M O R Y H I E R A R C H I E S M E M O R Y D A T A F L O W M E M O R Y A C C E S S O P T I M I Z A T I O N J A N L E M E I R E Memory Just
More informationTypes of Cache Misses: The Three C s
Types of Cache Misses: The Three C s 1 Compulsory: On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. 2 Capacity: Occur
More informationEnergy Efficient Caching-on-Cache Architectures for Embedded Systems
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 19, 809-825 (2003) Energy Efficient Caching-on-Cache Architectures for Embedded Systems HUNG-CHENG WU, TIEN-FU CHEN, HUNG-YU LI + AND JINN-SHYAN WANG + Department
More informationCS3350B Computer Architecture
CS3350B Computer Architecture Winter 2015 Lecture 3.1: Memory Hierarchy: What and Why? Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson
More informationLocality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example
Locality CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Principle of Locality: Programs tend to reuse data and instructions near
More informationCENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as
More informationSpring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand
Cache Design Basics Nima Honarmand Storage Hierarchy Make common case fast: Common: temporal & spatial locality Fast: smaller, more expensive memory Bigger Transfers Registers More Bandwidth Controlled
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 07: Caches II Shuai Wang Department of Computer Science and Technology Nanjing University 63 address 0 [63:6] block offset[5:0] Fully-AssociativeCache Keep blocks
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationClassification Steady-State Cache Misses: Techniques To Improve Cache Performance:
#1 Lec # 9 Winter 2003 1-21-2004 Classification Steady-State Cache Misses: The Three C s of cache Misses: Compulsory Misses Capacity Misses Conflict Misses Techniques To Improve Cache Performance: Reduce
More information10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache
Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationL2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary
HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40
More informationand data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed
5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationLecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)
Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 Techniques to Reduce Cache Misses Victim caches Better replacement policies pseudo-lru, NRU Prefetching, cache
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationLecture 11. Virtual Memory Review: Memory Hierarchy
Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationMemories. CPE480/CS480/EE480, Spring Hank Dietz.
Memories CPE480/CS480/EE480, Spring 2018 Hank Dietz http://aggregate.org/ee480 What we want, what we have What we want: Unlimited memory space Fast, constant, access time (UMA: Uniform Memory Access) What
More informationLecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1)
Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1) 1 Types of Cache Misses Compulsory misses: happens the first time a memory word is accessed the misses for an infinite cache
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationTextbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:
Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationMemory. From Chapter 3 of High Performance Computing. c R. Leduc
Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor
More informationCS Computer Architecture
CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast
More informationLogical Diagram of a Set-associative Cache Accessing a Cache
Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to
More informationEECS 470. Lecture 14 Advanced Caches. DEC Alpha. Fall Jon Beaumont
Lecture 14 Advanced Caches DEC Alpha Fall 2018 Instruction Cache BIU Jon Beaumont www.eecs.umich.edu/courses/eecs470/ Data Cache Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti,
More informationCS 152 Computer Architecture and Engineering. Lecture 6 - Memory
CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs152!
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches
CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single
More informationAdvanced optimizations of cache performance ( 2.2)
Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped
More informationMemory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt
Memory Hierarchy 2/18/2016 CS 152 Sec6on 5 Colin Schmidt Agenda Review Memory Hierarchy Lab 2 Ques6ons Return Quiz 1 Latencies Comparison Numbers L1 Cache 0.5 ns L2 Cache 7 ns 14x L1 cache Main Memory
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More information