Concepts Introduced in Appendix B. Memory Hierarchy. Equations. Memory Hierarchy Terms

Similar documents
LECTURE 12. Virtual Memory

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

CPE300: Digital System Architecture and Design

LECTURE 11. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Lecture 11 Cache. Peng Liu.

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Why memory hierarchy

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Topics. Digital Systems Architecture EECE EECE Need More Cache?

CMPSC 311- Introduction to Systems Programming Module: Caching

Logical Diagram of a Set-associative Cache Accessing a Cache

Virtual or Logical. Logical Addr. MMU (Memory Mgt. Unit) Physical. Addr. 1. (50 ns access)

Caching Basics. Memory Hierarchies

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Introduction. Memory Hierarchy

Advanced Computer Architecture

LSN 7 Cache Memory. ECT466 Computer Architecture. Department of Engineering Technology

Translation Buffers (TLB s)

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CPE 631 Lecture 04: CPU Caches

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas

Advanced Memory Organizations

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Course Administration

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

ECE468 Computer Organization and Architecture. Virtual Memory

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Page 1. Multilevel Memories (Improving performance using a little cash )

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

ECE4680 Computer Organization and Architecture. Virtual Memory

CS3350B Computer Architecture

Lec 11 How to improve cache performance

Memory Hierarchies 2009 DAT105

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

Memory Management! Goals of this Lecture!

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

EE 4683/5683: COMPUTER ARCHITECTURE

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Cache Performance (H&P 5.3; 5.5; 5.6)

CS 136: Advanced Architecture. Review of Caches

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Pollard s Attempt to Explain Cache Memory

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Memory. Objectives. Introduction. 6.2 Types of Memory

Topic 18: Virtual Memory

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Virtual Memory. Motivation:

Memory latency: Affects cache miss penalty. Measured by:

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

UCB CS61C : Machine Structures

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Modern Computer Architecture

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Memory Hierarchy: Motivation

CSE502: Computer Architecture CSE 502: Computer Architecture

Page 1. Memory Hierarchies (Part 2)

ECE 341. Lecture # 18

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS61C : Machine Structures

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Memory Hierarchy: The motivation

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

Memory Hierarchy. Mehran Rezaei

EN1640: Design of Computing Systems Topic 06: Memory System

Chapter 6 Objectives

14:332:331. Week 13 Basics of Cache

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

UC Berkeley CS61C : Machine Structures

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Chapter 8. Virtual Memory

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

COMP 3221: Microprocessors and Embedded Systems

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

ECE331: Hardware Organization and Design

CS61C : Machine Structures

Transcription:

Concepts Introduced in Appendix B Memory Hierarchy Exploits the principal of spatial and temporal locality. Smaller memories are faster, require less energy to access, and are more expensive per byte. general principles of memory hierarchies caches virtual memory Larger memories are slower, require more energy to access, and are less expensive per byte. Typically, data found in one level is also found in the level below. goals: Catch most of the references in the fast memory. Have the cost per byte almost as low as that at the slowest memory level. ill see that a memory hierarchy is used to implement protection schemes as well. Memory Hierarchy Terms Equations Hit - item found in that level of the hierarchy Miss - item not found in that level of the hierarchy Hit Time - time required to access the desired item Miss Penalty - the additional time required to service the miss Miss ate - fraction of accesses that are not in that level Block - the amount of information that is retrieved from the next lower level on a miss number_of _misses miss_rate = total_references number_of _hits hit_rate = total_references avg_mem_access_time = hit_time + miss_rate miss_penalty total_cycles = hit_time total_references + miss_penalty total_misses

Memory Hierarchy Evaluation Methods Memory Hierarchy Questions software approach Generate traces of memory addresses by instrumentation, simulation, or traps. Use a memory hierarchy simulator oine or on-the-y. hardware approach Use results from hardware performance counters. Often cannot exactly reproduce results. here can a block be placed in the current level? (block placement) How is a block found if it is in the upper level? (block identication) hich block should be replaced on a miss? (block replacement) hat happens on a write? (write strategy) Cache Topics Cache Organizations organization replacement policy write policy improving cache performance direct mapped A memory block can be placed in only one cache line. set = block_address MOD number _of _blocks_in_cache fully associative A memory block can be placed in any cache line. set associative A memory block can be placed in any cache line within a single set of lines. set = block_address MOD number _of _sets_in_cache

Block eplacement rite Policy If more than one block in a cache set, then a block must be selected to be replaced on a cache miss. random LU easy to implement in hardware may not be able to reproduce behavior least-recently assessed block is chosen reduces miss rates can be expensive to implement for high levels of associativity FIFO block loaded rst is chosen approximates LU, but is less complex to implement write through The data is written to both the current cache level and to the next level of the memory hierarchy. Simpler to implement. Can use write buers to reduce stalls. Cache and next memory hierarchy level are consistent. write back The data is written to only the current cache level. A modied cache block (dirty bit set) is written to the next memory hierarchy level when it is replaced. educes trac at the next memory hierarchy level. rite Miss Policy Direct Mapped Example Assume a 16 byte line size, 128 cache sets, and an associativity of one (direct mapped). How is the physical address partitioned? write allocate Block is loaded into the cache on a write miss. Typically used with write back. no-write allocate Block is not loaded into the cache on a write miss, but is updated in the next memory hierarchy level. Typically used with write through. tag index offset Fill in the information for the following memory references. Address Tag Index Offset esult 0x12c 0x4c0 0x1134 0x1138 0x24c8 0x128 0x4c4 0x8c8

Set Associative Example Assume a 2-way set associative cache with a 16 byte line size, 64 cache sets, and an LU replacement policy. How is the physical address partitioned? tag index offset Fill in the information for the following memory references. Address Tag Index Offset esult 0x12c 0x4c0 0x1134 0x1138 0x24c8 0x128 0x4c4 0x8c8 rite-through, No-rite Allocate Fill in what happens for each type of access for a write-through, no-write allocate cache. Access Update Access Next Update Next Type Cache Mem Hier Level Mem Hier Level read hit read miss write hit write miss rite-through, No-rite Allocate Example rite-back, rite Allocate Assume a 2-way set associative cache with a 16 byte line size, 64 cache sets, an LU replacement policy, and a write-through, no-write allocate policy. Fill in the information for the following memory references. / Address Tag Index Offset esult 0x12c 0x1134 0x1138 0x2130 0x2134 Access Next Mem Hier Level Update Cache Fill in what happens for each type of access for a write-back, write allocate cache. Access Update Access Next Update Next Type Cache Mem Hier Level Mem Hier Level read hit read miss write hit write miss

rite-back, rite Allocate Example Improving Cache Performance Assume a 2-way set associative cache with a 16 byte line size, 64 cache sets, an LU replacement policy, and a write-back, write allocate policy. Fill in the information for the following memory references. / Address Tag Index Offset esult 0x12c 0x1134 0x1138 0x2130 0x2134 Access Next Mem Hier Level Update Cache avg_mem_access_time = hit_time + miss_rate miss_penalty reducing cache miss rate reducing cache miss penalty reducing cache hit time educing Cache Miss ate Cache Miss Categories larger block size larger cache size higher associativity compulsory - The rst access to a block has to be loaded into cache. capacity - These blocks are replaced and later retrieved since the cache cannot contain all the blocks needed during execution. conict - Occurs when too many blocks map to the same cache set.

Larger Block Size Larger Cache Size advantages Exploits spatial locality. educes compulsory misses. disadvantages Lengthens miss penalty. May waste bandwidth bringing in bytes that are unused. advantages educes capacity and conict misses. disadvantages equires more space. May increase hit time. equires more power for each access. Higher Associativity educing Cache Miss Penalty advantages educes miss rate. disadvantages May increase hit time. equires more space and power. multi-level caches Give priority to read misses before write misses.

Multi-Level Caches Giving Priority to ead Misses over rite Misses Became popular as the miss penalty for primary caches continued to increase. Most general purpose machines have 3 or more levels of cache and now all on the same chip. L2 (L3) caches are typically much bigger than L1 (L2) caches with larger block sizes and higher associativity levels. L2 and L3 caches tend to be unied. avg_access_time = hit_time_l1 + miss_rate_l1 miss_penalty _L1 miss_penalty_l1 = hit_time_l2 + miss_rate_l2 miss_penalty _L2 local_miss_rate = misses_in_cache accesses_to_cache misses_in_cache global_miss_rate = accesses_to_l1_cache The CPU can check the addresses in the write buer on a read miss in case the desired item is in the buer. advantages The CPU is much more likely to use the value read from memory sooner than a value written to memory. The CPU can initiate the read miss to the next memory hierarchy level without waiting for the write buer to drain. The read access to the next memory hierarchy level can be avoided if it can be obtained from the write buer. rite buers can be used to make write-back caches more eective. First, copy the dirty block to a write buer. Second, load the block from the next memory hierarchy level to cache. Third, write the dirty block to the next memory hierarchy level. Disadvantage is the additional complexity. educing Cache Hit Time Virtual Caches virtual caches virtually indexed, physically tagged caches Access the cache with a virtual instead of a physical address. problems Have to purge the cache on a context switch unless include the PID as part of the tag. Can have synonyms (aliases), which are multiple copies of the same physical data with dierent virtual addresses. Page protection information from the DTLB would have to be added to each virtual cache line. I/O and L2/L3 accesses are typically performed using physical addresses and would require mapping to virtual addresses to deal with the virtually addressed cache for I/O and L2/L3 block replacements.

Virtually Indexed, Physically Tagged Caches Overlays Index into the cache with the page oset. Do the tag comparison after the virtual to physical address translation. Advantage is that the access to the data in the cache can start sooner. Limitation is that one way of a VIPT cache can be no larger than the page size. accessing virtual memory accessing the cache Virtual Page Number Tag Address Page Offset Index Offset A programmer divides their programs into pieces. The programmer determines which pieces are never needed to be used at the same time. A portion of the program loads from disk or stores to disk these pieces during execution. The programmer ensures that the maximum number of program pieces used at the same time ts into physical memory. Virtual Memory Virtual Memory Terms Each process has its own separate virtual address space. Divides physical memory into blocks and allocates these blocks to dierent processes. Provides a mapping between blocks in physical memory and blocks on disk. Allows a process to execute with only portions of the process being in main memory. Also reduces program startup time. Provides protection to prevent processes from accessing blocks inappropriately. Page is the name used for a block. Page fault is the name for a miss. Virtual address is the address produced by a CPU. Physical address is the address used to access main memory and typically cache as well. Page table is the data structure containing the mappings between virtual and physical addresses. Translation Lookaside Buer is a cache that contains a portion of the page table.

Page Tables Virtual Memory Questions Number of entries in page table is equal to the number of virtual pages. Number of virtual pages is the size of the virtual address space divided by the page size. Each process has its own page table. A page table entry will typically contain a physical page number, resident bit, dirty bit, use bit, protection eld (e.g. read only), disk address. here can a block be placed in main memory? It can be placed anywhere (fully associative) to reduce the miss rate. How is a block found if it is in main memory? The page table is indexed by the virtual page number and contains the physical page number. hich block should be replaced on a virtual memory miss? Use bits are used to approximate LU. hat happens on a write? A write-back, write-allocate policy is always used. Translation Lookaside Buers Virtual Memory Supports Multiprogramming Most machines use special caches called TLBs that contain a portion of the entries in a page table. Each entry in the TLB contains a tag (portion of the virtual page number) and most of the information in a page table entry. TLBs are typically invalidated when a context switch is performed. TLBs are typically quite small to provide a very fast translation. There are typically separate TLBs for instructions and data to support simultaneous access due to pipelining. Many processors have multiple levels of TLBs, where the L1 TLBs are separate and the higher levels are unied. Multiprogramming means that several processes can concurrently share a computer. A process is the code, data, and any state information used in the execution of a program. A context switch means transfering control of the machine from one process to another. Proper protection must be provided to prevent a process from inappropriately aecting another process or itself. Virtual memory systems keep bits in the page table and TLB entries to indicate the type and level of access that the process has to each of the pages.

TLB and Page Table Example Selecting a Page Size Page size is 512 bytes. TLB is direct-mapped and has 64 sets. TLB Index Page # Tag V 0???............ 7?? 0 8 2 0 1............ 63??? Page Table Index Page # esident Dirty Disk Addr... 0????..................... 71 9 Yes??..................... Given virtual address 0x8FDF, what is the physical address? Given virtual address 0x10DF, what is the physical address? Advantages of Using a Large Page Size The bigger the page size, the fewer the entries in the page table and the smaller the page table. Larger pages may allow the cache access to occur in parallel with the virtual to physical address translation. Transferring larger pages to/from disk is more ecient since fewer seeks are required. ill probably result in fewer TLB misses since there are fewer distinct pages referenced. Advantages of Using a Smaller Page Size ill waste less space. The miss penalty for a page fault and program startup time will decrease. Fallacies and Pitfalls Pitfall: Too small an address space. Pitfall: Ignoring the impact of the operating system on the performance of the memory hierarchy.