Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies
|
|
- Alisha Howard
- 5 years ago
- Views:
Transcription
1 TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science
2 2 Outline Chapter 5 - Memory hierarchies ( ) Temporal and spacial locality Hits and misses Direct-mapped, set associative, fully associative caches Addressing Handling writes Performance
3 3 Review What is control speculation? What is data speculation? What are the advantages of a superscalar vs a VLIW? What are the disadvantages of a superscalar vs a VLIW? When is a VLIW appropriate? When is a superscalar appropriate?
4 4 Datapath and control from Chapter 4
5 4 Datapath and control from Chapter 4
6 5 Memory technologies Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Access time of SRAM Capacity and cost/gb of disk Prices in 2008
7 6 Memory hierarchies - motivation Programmers want unlimited amount of fast memory Fast memory is expensive Large memories are slow Compromise - memory hierarchy is used to create the illusion of memory with the size of the largest and the speed of the fastest
8 7 Memory hierarchies Memory hierarchy: A structure that uses multiple levels of memories As the distance from the CPU increases, the size of the memories and the access time both increase The illusion of a large, fast memory is achieved by using the principles of locality
9 8 Principle of temporal locality If you read an address once, you are likely to touch it again (variables)
10 8 Principle of temporal locality If you read an address once, you are likely to touch it again (variables) If you execute an instruction once, you are likely to execute it again (loops)
11 8 Principle of temporal locality If you read an address once, you are likely to touch it again (variables) If you execute an instruction once, you are likely to execute it again (loops) Temporal locality Addresses recently referenced will tend to be referenced again soon
12 8 Principle of temporal locality If you read an address once, you are likely to touch it again (variables) If you execute an instruction once, you are likely to execute it again (loops) Temporal locality Addresses recently referenced will tend to be referenced again soon Caches exploit temporal locality!
13 9 Principle of spatial locality If you read an address once, you are likely to also read neighbouring addresses (arrays)
14 9 Principle of spatial locality If you read an address once, you are likely to also read neighbouring addresses (arrays) If you execute an instruction once, you are likely to access neighbouring instructions
15 9 Principle of spatial locality If you read an address once, you are likely to also read neighbouring addresses (arrays) If you execute an instruction once, you are likely to access neighbouring instructions Spacial locality If you access address X, you are likely to access an address close to X
16 9 Principle of spatial locality If you read an address once, you are likely to also read neighbouring addresses (arrays) If you execute an instruction once, you are likely to access neighbouring instructions Spacial locality If you access address X, you are likely to access an address close to X Caches exploit spatial locality!
17 10 Levels of hierarchy Exploit the principle of locality by using the memory hierarchy Memory closer to the CPU is a subset of memory further away All data is stored at the lowest level Data copied between only two levels at a time Upper levels: caching Lower levels: virtual memory
18 11 Exploiting locality Memory hierarchy Store everything on disk Copy recently accessed (and nearby) items from disk to smaller DRAM memory (main memory) Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory (CPU cache)
19 12 Organization of data Data transferred only between two levels at a time The data can be either present or not present in the upper level when needed The minimum unit of data that can be present or not present is called a block or a line The block is also the units transferred from the lower level when needed
20 13 Hierarchy and the computer Concepts used to build memory systems affect many other aspects of a computer and its performance: How the operating system manages memory and I/O How compilers generate code How applications use the computer Since all programs spend much of their time accessing memory, the memory system is the major factor in determining performance Programmers should understand the memory hierarchy to achieve proper performance
21 14 Hits and misses (1/2) A hit occurs when data referenced by the processor is available in a block in the upper level Otherwise, it is a miss On a miss the block containing the data must be transferred from the next level in the hierarchy The hit rate is the fraction of memory accesses found in the upper level The fraction not found is called the miss rate
22 15 Hits and misses (2/2) The hit time is the time needed to access data from the upper level Includes the time to determine if it is a hit or a miss The miss penalty is the time needed to access data that is not available in the upper level Includes the time to transfer the block from the lower level and to deliver the requested data The hit time is much smaller than the miss penalty
23 16 Programs and locality Programs tend to reuse recently accessed data items (temporal locality) and reference data items that are close to recently accessed data (spatial locality) Memory hierarchies exploit temporal locality by keeping more recently accessed data items closer to the processor Memory hierarchies exploit spatial locality by moving blocks consisting of multiple contiguous words in memory to upper levels of the hierarchy Most systems use a true hierarchy data present at level i is also present at level i + 1
24 17 Cache Cache levels in the hierarchy between the main memory and the processor Simple (level 1) cache where a block is one single word The cache before and after a reference to X n
25 18 Direct-mapped cache How do we know if the requested word is in the cache? How do we find it? Easy to find a word in the cache if a memory location is mapped to exactly one cache location If the address of the memory location determines the exact placement in the cache it is called a direct-mapped cache Typical mapping: (blockaddress) mod (#cacheblocks)
26 19 Direct-mapped 1-word block sized cache Mapping: (wordaddress) mod (#cachewords) 8 word cache for 32 word memory: The 3 least significant bits of the address determine cache position
27 20 Direct-mapped 1-word block sized cache How do we know which address word is in a given cache word? We need to store the remaining upper address bits with the data The upper part of the address stored with the data block is called a tag For our 32 word memory, 8 word cache the tag is 2 bit What if there is invalid data in the cache word? We need a valid bit for each block For each block we have: Valid bit, tag, data block
28 21 Example: reads on the simple cache See pages
29 22 Larger blocks In order to take advantage of spatial locality, caches use blocks several words in size We only need one valid bit and one tag per block (less storage overhead) The block address is byteaddress #bytesinblock With 16 bytes per block byte, address 1200 has block address 75
30 23 Larger blocks and hit rate To large blocks decrease hit rate Many words not used before block is kicked out Larger blocks increase miss penalty
31 24 Anatomy of an address Tag Index Byte offset Byte offset: What is the first byte in the cache line are we reading? Index: Which cache line are we reading? Tag: How we differentiate between other addresses with the same Index and Byte offset. Consider a 64 KB, direct mapped cache with 64B cache lines. Assuming a 32-bit address, how many bits are used for Tag? Index? Byte offset?
32 24 Anatomy of an address Tag Index Byte offset Byte offset: What is the first byte in the cache line are we reading? Index: Which cache line are we reading? Tag: How we differentiate between other addresses with the same Index and Byte offset. Consider a 64 KB, direct mapped cache with 64B cache lines. Assuming a 32-bit address, how many bits are used for Tag? Index? Byte offset? Index: 10 bits, byte offset: 2 bits, tag: 16 bits
33 25 Cache implementation
34 26 Caches and associativity fully associative Instead of direct mapping a cache design can be fully associative In a fully associative cache a block can be put in any position in the cache regardless of address Requires a full search of the tags to determine cache hit or miss Increases hardware costs What is the size of the Index field for a fully associative cache?
35 27 Caches and associativity set associative Direct mapping and fully associative are two ends of the spectrum. Set associative caches are somewhere in between In a set associative cache one address map to a fixed number of locations in the cache A set associative cache with n locations for each block is called n-way set associative Index of the set in the cache is given by: (blockaddress) mod (#setsinthecache) All tags in the set must be searched to determine hit or miss
36 28 Caches and associativity Direct mapped is the same as one-way set associative Fully associative is m-way set associative where m is the number of blocks in the cache
37 29 Associativity, hit time and hit rate Increased associativity can increase the hit time Tag search takes more time Increased associativity can decrease the miss rate Blocks are kept longer in the cache Associativity Data miss rate % % % % FastMATH processor running SPEC2000
38 30 Block replacement With associativity one has to decide which block to remove when a set is full on a cache miss Two strategies: Least recently used (LRU) Random LRU needs hardware to track access
39 31 Example: associative caches 1 word blocks, four blocks 3 different cache implementations direct-mapped two-way set associative fully associative Block addresses addressed in sequence
40 32 Example: associative caches direct mapped
41 33 Example: associative caches 2-way set
42 34 Example: associative caches fully associative
43 35 Associativity and tag-bits Increasing associativity increases number of comparators needed It also increases the size of the tag fields Assume a 4 K blocks, 16 byte block size, 32-bit address cache Direct mapped: 16 bit tag 64 Kbits total for tags 2-way : 2K sets 17 bit tag 68 Kbits total for tags 4-way : 1K sets 18 bit tag 72 Kbits total for tags fully : 1 set 28 bit tag 112 Kbits total for tags
44 36 Implementation of direct mapped
45 37 Implementation of 4-way set associative
46 38 Handling of cache misses On a cache hit the processor proceeds as normal on the next clock cycle On a cache miss the processor must be stalled until data is available in the cache Freezes the contents of the pipeline registers and the register file On an instruction read the instruction register is invalid, and must be re-fetch
47 39 Miss penalty elaborated A large block size increases the transfer time of the block We can hide some of the transfer time Early restart Resume execution when the requested word is available in the cache, possibly before the transfer is complete Requested word first or critical word first Transfer the requested word in the block first and the consecutively the rest of the block wrapping the address at the top of the block
48 40 Memory writes Write-through can help with memory consistency On a write the block is read from the lower level (if no present in cache) The new word is written to both the word in the cache and the address in main memory. Poor performance Write buffer A buffer holds a queue of write accesses to main memory Write-back Writes are only to the cache block Block is written when replaced
49 40 Memory writes Write-through can help with memory consistency On a write the block is read from the lower level (if no present in cache) The new word is written to both the word in the cache and the address in main memory. Poor performance Write buffer A buffer holds a queue of write accesses to main memory Write-back Writes are only to the cache block Block is written when replaced (requires a dirty bit!)
50 41 Memory writes Write-through + no-write-allocate On miss, write through to next level Write-back + write-allocate On miss, read line from next level, place in cache, write to cache When block is evicted, write the line back to memory
51 42 Types of cache misses Cache misses can be divided into three categories depending on the reason for the miss: Compulsory misses access to a block that has never been in the cache Capacity misses access to blocks that have been kicked out because of cache size Conflict misses access to blocks that have been kicked out of a set associative or direct mapped cache, but would have been available in a fully associative cache
52 43 Types of cache misses
53 44 Impact of miss rate on performance Example on page 477
54 45 16 KB cache in FastMATH (MIPS) 4 K words, 16-word blocks seperate instruction and data cache OS decides between write-through and write-back Effective miss-rate 3.2 % (SPEC2000 integer benchmarks) 11.4% data 0.4% instruction Bits 5-2 is used to index the block and select the word from the block
55 46 Split vs. combined cache A combined cache with a total size equal to the sum of the two split caches gives a better hit rate FastMATH Split cache miss rate 3.24 % combined cache miss rate 3.18 % Split cache double the cache bandwidth, the processor can access both the instruction cache and the data cache in the same clock cycle The increased bandwidth easily overcomes the disadvantages of the increased miss rate
56 47 Designing the memory system to support caches (1/3) Miss penalty can be reduced if memory to cache bandwidth increased (allows larger blocks while maintaining low miss penalty) Bus clock rate usually much slower than processor (e.g., factor of 10), affecting the miss penalty Assume 1 memory bus clock cycle to send the address, 15 memory bus clock cycles for each DRAM access initiated and 1 memory bus clock cycle to send a word of data If 4 blocks and a one-word wide bank, the miss penalty would be x x 1 = 65 memory bus clock cycles. Transferred bytes per bus clock cycle: (4x4)/65 = 0.25
57 48 Increasing bandwidth by widening bus (2/3) Widen memory and buses between the processor and the memory Memory bandwidth increases proportionally Miss penalty improvement from previous example with main-memory width of two words: x x 1= 33 memory bus clock cycles, down from 65. Main-mem width 4 words: 17 cycles. Costs: wider bus and the potential increase in access time due to the multiplexor and control logic between the processor and cache
58 49 Increasing bandwidth by interleaving (3/3) Memory chips are organized in banks to read or write multiple words in one access time rather than reading or writing a single word each time. Sending an address to several banks permits them all to read simultaneously. This gives the advantage of incurring the full memory latency only once x x 1 = 20 memory bus clock cycles
59 50 Bytes / clock cycle for a single miss
60 51 Cache performance CPU time = (execution cycles + stall cycles) * cycle time stall cycles = read stalls + write stalls read stalls = reads/prog * read miss rate * read miss penalty write stalls = writes/prog * write miss rate * write miss penalty + buffer stalls (buffer stalls << write misses) read miss penalty write miss penalty memory stalls = mem access/prog * miss rate * miss penalty = inst/prog * miss/inst * miss penalty
61 52 Multi-level cache Miss penalty reduced Level 1 cache focuses on reducing hit time smaller cache size smaller block size Level 2 cache focuses on reducing miss penalty larger cache size larger block size
62 53 Review Cache lines exploit which locality? What is the benefit of associativity? What is the cost of associativity? Why aren t L1 caches big? What is temporal locality? What is an example of code that has no temporal locality?
Chapter Seven. Large & Fast: Exploring Memory Hierarchy
Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationChapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)
Chapter 7-1 Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授 V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) 臺大電機吳安宇教授 - 計算機結構 1 Outline 7.1 Introduction 7.2 The Basics of Caches
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationLocality. Cache. Direct Mapped Cache. Direct Mapped Cache
Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be
More informationCENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationTextbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:
Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationLECTURE 10: Improving Memory Access: Direct and Spatial caches
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses
More informationChapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review
Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic
More informationChapter 5. Memory Technology
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationV. Primary & Secondary Memory!
V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationMemory Hierarchy: Caches, Virtual Memory
Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories
More informationMemory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy
ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses
More information14:332:331. Week 13 Basics of Cache
14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head
More information14:332:331. Week 13 Basics of Cache
14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006
More informationThe Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):
The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance
More informationCSE 2021: Computer Organization
CSE 2021: Computer Organization Lecture-12a Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB
More informationCS161 Design and Architecture of Computer Systems. Cache $$$$$
CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationCSE 2021: Computer Organization
CSE 2021: Computer Organization Lecture-12 Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationComputer Systems Laboratory Sungkyunkwan University
Caches Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns
More informationRegisters. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH
PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationCENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System
More informationCaches. Hiding Memory Access Times
Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationLet!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies
1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of
More informationThe Memory Hierarchy & Cache
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationChapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)
Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationWelcome to Part 3: Memory Systems and I/O
Welcome to Part 3: Memory Systems and I/O We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? We will now focus on memory issues, which are frequently
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationQuestion?! Processor comparison!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!
More informationMemory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy
Datorarkitektur och operativsystem Lecture 7 Memory It is impossible to have memory that is both Unlimited (large in capacity) And fast 5.1 Intr roduction We create an illusion for the programmer Before
More informationCaches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)
Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes) This week: Announcements PA2 Work-in-progress submission Next six weeks: Two labs and two projects
More informationLecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter
Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "
More informationLocality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example
Locality CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Principle of Locality: Programs tend to reuse data and instructions near
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More information3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationMemory Hierarchy: The motivation
Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationLecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationMemory Management! Goals of this Lecture!
Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Why it works: locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware and
More informationCS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches
CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationMemory Hierarchy: Motivation
Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department
More informationCOSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University
COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating
More informationCS3350B Computer Architecture
CS3350B Computer Architecture Winter 2015 Lecture 3.1: Memory Hierarchy: What and Why? Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3
More informationCaches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST
CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's
More informationMemory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky
Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data
More informationMemory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"
Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Spatial and temporal locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware
More informationLecture 16. Today: Start looking into memory hierarchy Cache$! Yay!
Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationMemory Management. Goals of this Lecture. Motivation for Memory Hierarchy
Memory Management Goals of this Lecture Help you learn about: The memory hierarchy Spatial and temporal locality of reference Caching, at multiple levels Virtual memory and thereby How the hardware and
More informationComputer Architecture Memory hierarchies and caches
Computer Architecture Memory hierarchies and caches S Coudert and R Pacalet January 23, 2019 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches
More informationThe levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms
The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested
More information