Logical Diagram of a Set-associative Cache Accessing a Cache

Similar documents
Introduction. Memory Hierarchy

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Advanced Computer Architecture

Memory Hierarchy. Slides contents from:

CS3350B Computer Architecture

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Memory Hierarchy. Slides contents from:

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

Why memory hierarchy

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Page 1. Memory Hierarchies (Part 2)

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Page 1. Multilevel Memories (Improving performance using a little cash )

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Introduction to OpenMP. Lecture 10: Caches

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

EN1640: Design of Computing Systems Topic 06: Memory System

Lecture 12: Memory hierarchy & caches

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

EN1640: Design of Computing Systems Topic 06: Memory System

Translation Buffers (TLB s)

EEC 483 Computer Organization

CPE300: Digital System Architecture and Design

Cache Performance (H&P 5.3; 5.5; 5.6)

Review: Computer Organization

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

Advanced Caching Techniques

Cray XE6 Performance Workshop

LECTURE 12. Virtual Memory

Advanced Memory Organizations

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

CS/CoE 1541 Exam 2 (Spring 2019).

Pollard s Attempt to Explain Cache Memory

Memory Hierarchy Design (Appendix B and Chapter 2)

Assignment 1 due Mon (Feb 4pm

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

CMPSC 311- Introduction to Systems Programming Module: Caching

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Computer Architecture CS372 Exam 3

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSE 2021: Computer Organization

CSE 2021: Computer Organization

EE 4683/5683: COMPUTER ARCHITECTURE

CSE502: Computer Architecture CSE 502: Computer Architecture

Chapter 2: Memory Hierarchy Design Part 2

Advanced Caching Techniques

CS 136: Advanced Architecture. Review of Caches

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 2)

Course Administration

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Lecture notes for CS Chapter 2, part 1 10/23/18

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Memory Hierarchies 2009 DAT105

Chapter 2: Memory Hierarchy Design Part 2

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Performance metrics for caches

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

ECE 571 Advanced Microprocessor-Based Design Lecture 10

14:332:331. Week 13 Basics of Cache

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Computer Architecture Spring 2016

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Virtual Memory Review. Page faults. Paging system summary (so far)

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Caches III CSE 351 Spring

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

12 Cache-Organization 1

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Transcription:

Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to the CPU: small, fast access close to memory: large, slow access Memory hierarchies improve performance caches: demand-driven storage principal of locality of reference temporal: a referenced word will be referenced again soon spatial: words near a reference word will be referenced soon speed/size trade-off in technology! fast access for most references First Cache: IBM 360/85 in the late!60s Spring 2006 CSE 471 - Memory Hierarchy 1 Spring 2006 CSE 471 - Memory Hierarchy 2 Logical Diagram of a Cache Cache Organization Block: # bytes associated with 1 tag usually the # bytes transferred on a memory request Set: the blocks that can be accessed with the same index bits Associativity: the number of blocks in a set direct mapped set associative fully associative Size: # bytes of data How do you calculate this? Spring 2006 CSE 471 - Memory Hierarchy 3 Spring 2006 CSE 471 - Memory Hierarchy 4 Logical Diagram of a Set-associative Cache Accessing a Cache General formulas number of index bits = log 2 (cache size / block size) (for a direct mapped cache) number of index bits = log 2 (cache size /( block size * associativity)) (for a set-associative cache) Spring 2006 CSE 471 - Memory Hierarchy 5 Spring 2006 CSE 471 - Memory Hierarchy 6

Cache size the bigger the cache, + the higher the hit ratio - the longer the access time Block size the bigger the block, + the better the spatial locality + less block transfer overhead/block + less tag overhead/entry (assuming same number of entries) - might not access all the bytes in the block Spring 2006 CSE 471 - Memory Hierarchy 7 Spring 2006 CSE 471 - Memory Hierarchy 8 Associativity the larger the associativity, + the higher the hit ratio - the larger the hardware cost (comparator/set) - the longer the hit time (a larger MUX) - need hardware that decides which block to replace - increase in tag bits (if same size cache) Associativity is more important for small caches than large because more memory locations map to the same line e.g., TLBs! Memory update policy write-through performance depends on the # of writes store buffer decreases this check on load misses store compression write-back performance depends on the # of dirty block replacements but... dirty bit & logic for checking it tag check before the write must flush the cache before I/O optimization: fetch before replace both use a merging store buffer Spring 2006 CSE 471 - Memory Hierarchy 9 Spring 2006 CSE 471 - Memory Hierarchy 10 Cache contents separate instruction & data caches separate access! double the bandwidth shorter access time different configurations for I & D unified cache lower miss rate less cache controller hardware Address Translation In a nutshell: maps a virtual address to a physical address, using the page tables number of page offset bits = page size Spring 2006 CSE 471 - Memory Hierarchy 11 Spring 2006 CSE 471 - Memory Hierarchy 12

TLB Using a TLB Translation Lookaside Buffer (TLB): cache of most recently translated virtual-to-physical page mappings typical configuration 64/128-entry fully associative 4-8 byte blocks.5-1 cycle hit time low tens of cycles miss penalty misses can be handled in software, software with hardware assists, firmware or hardware write-back works because of locality of reference much faster than address translation using the page tables (1) Access a TLB using the virtual page number. (2) If a hit, (3) If a miss, concatenate the physical page number & the page offset bits, to form a physical address; set the reference bit; if writing, set the dirty bit. get the physical address from the page table; evict a TLB entry & update dirty/reference bits in the page table; update the TLB with the new mapping. Spring 2006 CSE 471 - Memory Hierarchy 13 Spring 2006 CSE 471 - Memory Hierarchy 14 Virtual or physical addressing Virtually-addressed caches: access with a virtual address (index & tag) do address translation on a cache miss + faster for hits because no address translation + compiler support for better data placement Spring 2006 CSE 471 - Memory Hierarchy 15 Virtually-addressed caches: - need to flush the cache on a context switch process identification (PID) can avoid this - synonyms the synonym problem if 2 processes are sharing data, two (different) virtual addresses map to the same physical address 2 copies of the same data in the cache on a write, only one will be updated; so the other has old data a solution: page coloring processes share segments; all shared data have the same offset from the beginning of a segment, i.e., the same loworder bits cache must be <= the segment size (more precisely, each set of the cache must be <= the segment size) index taken from segment offset, tag compare on segment # Spring 2006 CSE 471 - Memory Hierarchy 16 Virtual or physical addressing Physically-addressed caches do address translation on every cache access access with a physical index & compare with physical tag + no cache flushing on a context switch + no synonym problem Physically-addressed caches - if a straightforward implementation, hit time increases because must translate the virtual address before access the cache + increase in hit time can be avoided if address translation is done in parallel with the cache access restrict cache size so that cache index bits are in the page offset (virtual & physical bits are the same): virtually indexed access the TLB & cache at the same time compare the physical tag from the cache to the physical address (page frame #) from the TLB: physically tagged can increase cache size by increasing associativity, but still use page offset bits for the index Spring 2006 CSE 471 - Memory Hierarchy 17 Spring 2006 CSE 471 - Memory Hierarchy 18

Cache Hierarchies Cache Hierarchies Cache hierarchy different caches with different sizes & access times & purposes + decrease effective memory access time: many misses in the L1 cache will be satisfied by the L2 cache avoid going all the way to memory Level 1 cache goal: fast access so minimize hit time (the common case) Spring 2006 CSE 471 - Memory Hierarchy 19 Spring 2006 CSE 471 - Memory Hierarchy 20 Cache Hierarchies Cache Metrics Level 2 cache goal: keep traffic off the system bus Hit (miss) ratio = measures how well the cache functions useful for understanding cache behavior relative to the number of references intermediate metric Effective access time = (rough) average time it takes to do a memory reference performance of the memory system, including factors that depend on the implementation intermediate metric Spring 2006 CSE 471 - Memory Hierarchy 21 Spring 2006 CSE 471 - Memory Hierarchy 22 Measuring Cache Hierarchy Performance Measuring Cache Hierarchy Performance Effective Access Time for a cache hierarchy:... Local Miss Ratio: # accesses for the L1 cache: the number of references # accesses for the L2 cache: the number of misses in the L1 cache Example: 1000 references 40 L1 misses 10 L2 misses local MR (L1): local MR (L2): Spring 2006 CSE 471 - Memory Hierarchy 23 Spring 2006 CSE 471 - Memory Hierarchy 24

Measuring Cache Hierarchy Performance Miss Classification Global Miss Ratio: Example: global MR (L1): global MR (L2): 1000 References 40 L1 misses 10 L2 misses Usefulness is in providing insight into the causes of misses does not explain what caused a particular, individual miss Compulsory first reference misses decrease by increasing block size Capacity due to finite size of the cache decrease by increasing cache size Conflict too many blocks map to the same set decrease by increasing associativity Coherence (invalidation) decrease by decreasing block size + improving processor locality Spring 2006 CSE 471 - Memory Hierarchy 25 Spring 2006 CSE 471 - Memory Hierarchy 26