Lecture 20: Virtual Memory, Protection and Paging. Multi-Level Caches

Similar documents
Lecture 17: Address Translation. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 19: Memory Hierarchy: Cache Design. Recap: Basic Cache Parameters

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

Lecture 16: Cache in Context (Uniprocessor) James C. Hoe Department of ECE Carnegie Mellon University

Lecture 17: Memory Hierarchy: Cache Design

Lecture 14: Memory Hierarchy. James C. Hoe Department of ECE Carnegie Mellon University

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Lecture 19: Survey of Modern VMs. Housekeeping

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Virtual Memory: From Address Translation to Demand Paging

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches

Lecture 22: Virtual Memory: Survey of Modern Systems

Virtual Memory: From Address Translation to Demand Paging

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

Lecture 19: Survey of Modern VMs + a Decomposition of Meltdown. James C. Hoe Department of ECE Carnegie Mellon University

CS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Lecture 9 - Virtual Memory

Memory Hierarchy. Mehran Rezaei

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Virtual Memory Oct. 29, 2002

Processes and Virtual Memory Concepts

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

CS61C : Machine Structures

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

ECE468 Computer Organization and Architecture. Virtual Memory

ECE4680 Computer Organization and Architecture. Virtual Memory

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Memory Management! Goals of this Lecture!

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Modern Virtual Memory Systems. Modern Virtual Memory Systems

Computer Architecture Lecture 13: Virtual Memory II

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Handout 4 Memory Hierarchy

CSE 153 Design of Operating Systems

Modern Computer Architecture

18-447: Computer Architecture Lecture 16: Virtual Memory

Computer Systems. Virtual Memory. Han, Hwansoo

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

V. Primary & Secondary Memory!

ECE331: Hardware Organization and Design

14 May 2012 Virtual Memory. Definition: A process is an instance of a running program

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Virtual Memory. Computer Systems Principles

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

virtual memory Page 1 CSE 361S Disk Disk

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

CISC 360. Virtual Memory Dec. 4, 2008

Caches. Samira Khan March 23, 2017

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Computer Science 146. Computer Architecture

CS252 Spring 2017 Graduate Computer Architecture. Lecture 17: Virtual Memory and Caches

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe

Virtual Memory. Virtual Memory

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

ECE 571 Advanced Microprocessor-Based Design Lecture 12

A Typical Memory Hierarchy

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Virtual Memory Nov 9, 2009"

18-447: Computer Architecture Lecture 18: Virtual Memory III. Yoongu Kim Carnegie Mellon University Spring 2013, 3/1

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 8. Virtual Memory

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

A Few Problems with Physical Addressing. Virtual Memory Process Abstraction, Part 2: Private Address Space

Lecture 12: Memory hierarchy & caches

A Typical Memory Hierarchy

New-School Machine Structures. Overarching Theme for Today. Agenda. Review: Memory Management. The Problem 8/1/2011

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

CSE502: Computer Architecture CSE 502: Computer Architecture

Topic 18 (updated): Virtual Memory

Virtual Memory: Concepts

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Transcription:

S 09 L20-1 18-447 Lecture 20: Virtual Memory, Protection and Paging James C. Hoe Dept of ECE, CMU April 8, 2009 Announcements: Best class ever, next Monday Handouts: H14 HW#4 (on Blackboard), due 4/22/09 **Assigned Reading** "Virtual memory in contemporary microprocessors." B. L Jacob and T. N. Mudge. IEEE Micro, July/August 1998. (on Blackboard) Multi-Level Caches S 09 L20-2 T i =t i +m i T i+1 L1-I L1-D - a few pclk latency - many GB/sec (random access) On-chip or off-chip? L2-Unified intermediate hierarchy is a cheaper way to reduce T 1 then using a faster or bigger L1, DRAM - hundreds of pclk latency - ~GB/sec (sequential) Remember, memory hierarchy is also about memory bandwidth

Multi-Level Cache Design S 09 L20-3 Upper Hierarchies small C: upper-bound by SRAM access time smallish B: upper-bound by C/B effects and the benefits of fine-grain spatial locality a: required to counter C/B effects Lower Hierarchies large C: upper-bound by chip area (or how much you are willing to pay off-chip) large B: to reduce tag storage overhead and to take advantage of coarse-grain spatial locality a: upper bound by complexity (off-chip implementations) Very large off-chip caches are either direct-mapped or use on-chip tag-ram and hit-logic Newer on-chip L2s can be highly associative Inclusion Principle S 09 L20-4 Traditionally, L i contents is always a subset of L i+1 if a memory location is important enough to be in L i, it must be important enough to be in L i+1 external agents (I/O, other processors) only have to check the lowest level l to know if a memory location is cached---do d not need to consume L1 bandwidth Nontrivial to maintain when L i+1 has lower associativity E.g. a single L i miss may trigger multiple L i evictions suppose L i has a>1, and L i+1 has a=1 suppose x, y, z have same L i index suppose y, z have the same L i+1 index and different from x s suppose initially x and y are cached in L i (and hence L i+1 ) suppose a miss to z evicts x from L i according to LRU z must evict y from L i+1 due to collision y must also be evicted from L i to maintain inclusion New multicores tend not to maintain inclusion anymore, why?

Possible Inclusion Violation direct mapped L2 step 1. L1 miss on z S 09 L20-5 x step 2. x displaced to L2 x y y z 2-way set asso. L1 step 3. y replaced by z x,y,z have same L1 idx bits y,z have the same L2 idx bits x,{y,z} have different L2 idx bits 2 Parts to Modern VM S 09 L20-6 In a multi-tasking system, VM provides each process with the illusion of a large, private, and uniform memory Ingredient A: Naming and Protection each process sees a large, contiguous memory segment without holes ( virtual address space is much bigger than the available physical storage) each process s memory space is private, i.e. protected from access by other processes Ingredient B: demand paging capacity of secondary storage (swap space on disk) at the speed of primary storage (DRAM)

Mechanism: Address Translation S 09 L20-7 Protection, VM and demand paging enabled by a common HW mechanism: address translation User process operates on effective addresses HW translates from EA to PA on every memory reference controls which physical locations (DRAM and/or swap disk) can be named by a process allows dynamic relocation of physical backing store (DRAM vs. swap disk) Address translation HW and policies controlled by the OS and protected from users, i.e., priviledged Evolution of Memory Protection Earliest machines did not have the concept of protection and address translation no need---single process, single user automatically private and uniform (but not very large) programs operated on physical addresses directly cannot support multitasking protection Multitasking 101 give each process a non-overlapping, contiguous physical memory region active process s s everything belonging to a process region must fit in that region how do you keep one process from another process s reading or trashing another process region data? S 09 L20-8

Base and Bound A process s private memory region can be defined by base: starting address of the region bound: ending address of the region User process issue effective address (EA) between 0 and the size of its allocated region private and uniform S 09 L20-9 0 PA base bound privileged control registers active process s region another process s region 0 max EA max Base and Bound Registers S 09 L20-10 When switching user processes, OS sets base and bound registers Translation and protection check in hardware on every user memory reference PA = EA + base if (PA < bound) then okay else violation User processes cannot be allowed to modify the base and bound registers themselves requires 2 privilege levels such that the OS can see and modify certain states that the user cannot privileged instructions and state

Segmented Address Space S 09 L20-11 Limitations of the base and bound scheme large contiguous space is hard to come by after the system runs for a while---free space may be fragmented how do two processes shared some memory regions but not others? A base&bound pair is the unit of protection give user multiple memory segments each segment is a continuous region each segment is defined by a base and bound pair Earliest use, separate code and data segments 2 sets of base-and-bound reg s for inst and data fetch allow processes to share read-only code segments became more elaborate later: code, data, stack, etc Segmented Address Translation EA comprise a segment number (SN) and a segment offset (SO) SN may be specified explicitly or implied (code vs. data) segment size limited by the range of SO segments can have different sizes, not all SOs are meaningful Segment translation table maps SN to corresponding base and bound separate mapping for each process must be a privileged structure S 09 L20-12 SN SO how to change mapping when swapping processes? segment table base bound +,< PA, okay?

Access Protections S 09 L20-13 In addition to naming, finer-grain access protection can be associated with each segment as extra bits in the segment table Generic options include readable? writeable? executable? also misc. options such as cacheable? which level? Normal data pages RW(!E) Static shared data pages R(!W)(!E) Code pages R(!W)E What about self modifying code? Illegal pages (!R)(!W)(!E) Why would any one want this? Another Use of Segments S 09 L20-14 How to extend an old ISA to support larger addresses for new applications while remain compatible with old applications? User-level l segmented addressing old applications use identity mapping in table new applications reload segment table at run time with active segments to access different regions in memory complications of a non-linear address space: dereferencing some pointers (aka long pointers) requires more work if it is not in an active segment SN SO small EA can be orthogonal from protection considerations large base large EA

Paged Address Space S 09 L20-15 Divide PA and EA space into fixed size segments known as page frames, historically 4KByte EAs and PAs are interpreted as page number (PN) and page offset (PO) page table translates EPN to PPN VPO is the same as PPO, just concatenate to PPN to get PA EPN PO page actual table table is much more complicated than this PPN protection? concat PA Fragmentation S 09 L20-16 External Fragmentation a system may have plenty of unallocated DRAM, but they are useless in a segmented system if they do not form a contiguous region of a sufficient size paged memory management eliminates external fragmentation Internal Fragmentation with paged memory management, a process is allocated an entire page (4KByte) even if it only needs 4 bytes a smaller page size reduces likelihood for internal fragmentation modern ISA are moving to larger page sizes (Mbytes) in addition to 4KBytes Why?

Demand Paging S 09 L20-17 Use main memory and swap disk as automatically managed levels in the memory hierarchies analogous to cache vs. main memory Early attempts von Neumann already described manual memory hierarchies Brookner s interpretive coding, 1960 a software interpreter that managed paging between a 40KByte main memory and a 640KByte drum Atlas, 1962 hardware demand paging between a 32-page (512 word/page) main memory and 192 pages on drums user program believes it has 192 pages Demand Paging vs. Caching S 09 L20-18 Drastically different size and time scale drastically different design considerations L1 Cache L2 Cache Demand Paging Capacity 10~100KByte MByte GByte Block size ~16 Byte ~128 Byte 4K~4M Byte hit time a few cyc a few 10s cyc a few 100s cyc miss penalty a few 10s cyc a few 100s cyc 10 msec miss rate 0.1~10% (?) 0.00001~0.001% hit handling HW HW HW miss handling HW HW SW Hit time, miss penalty and miss rate are not really independent variables!!

Same Fundamentals S 09 L20-19 Potentially M=2 m bytes of memory, how to keep the most frequently used ones in C bytes of fast storage where C << M Basic issues (1) where to cache a virtual page in DRAM? (2) how to find a virtual page in DRAM? (3) granularity of management (4) when to bring a page into DRAM? (5) which virtual page to evict from DRAM to disk to free-up DRAM for new pages? (5) is much better covered in an OS course BTW, architects should take OS and compiler DRAM in the case of demand paging Terminology and Usage S 09 L20-20 Physical Address addresses that refer directly to specific locations on DRAM or on swap disk Effective Address addresses emitted by user applications for data and code accesses usage is most often associated with protection Virtual Address addresses in a large, linear memory seen by user app s often larger than DRAM and swap disk capacity virtual in the sense that some addresses are not backed up by physical storage (hopefully you don t try to use it) usage is most often associated with demand paging Modern memory system always have both protection and demand paging; usage of EA and VA is sometimes muddled

EA, VA and PA (IBM s view) S 09 L20-21 EA 0 divided into X fixed-size segments PA divided into W pages (Z>>W) EA 1 divided into X fixed-size segments segmented EA: private, contiguous + sharing VA divided into Y segments (Y>>X); also divided as Z pages (Z>>Y) Swap disk divided into V pages (Z>>V, V>>W) demand paged VA: size of swap, speed of DRAM S 09 L20-22 EA, VA and PA (almost everyone else) EA 0 with unique ASID=0 PA divided into W pages (Z>>W) EA i with unique ASID=i Swap disk divided into V pages (Z>>V, V>>W) EA and VA almost synonymous VA divided into N address space indexed by ASID; also divided as Z pages (Z>>Y) how do processes share pages?