This Unit: Main Memory. Virtual Memory. Virtual Memory. Other Uses of Virtual Memory

Similar documents
ECE 152 / 496 Introduction to Computer Architecture

ECE 250 / CPS 250 Computer Architecture. Main Memory and Virtual Memory

CSE 560 Computer Systems Architecture

Rest of the Semester. CIS 371 Computer Organization and Design. Readings. This Unit: Virtualization & I/O

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Computer Science 146. Computer Architecture

Virtual Memory: From Address Translation to Demand Paging

CSE 351. Virtual Memory

COSC3330 Computer Architecture Lecture 20. Virtual Memory

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Virtual Memory Oct. 29, 2002

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Virtual Memory. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements

Multi-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

A Few Problems with Physical Addressing. Virtual Memory Process Abstraction, Part 2: Private Address Space

Computer Systems. Virtual Memory. Han, Hwansoo

HY225 Lecture 12: DRAM and Virtual Memory

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

UC Berkeley CS61C : Machine Structures

virtual memory Page 1 CSE 361S Disk Disk

Virtual Memory, Address Translation

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

Virtual Memory. Motivation:

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Virtual Memory: Mechanisms. CS439: Principles of Computer Systems February 28, 2018

John Wawrzynek & Nick Weaver

Virtual Memory. CS 3410 Computer System Organization & Programming

14 May 2012 Virtual Memory. Definition: A process is an instance of a running program

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Main Memory (Fig. 7.13) Main Memory

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

Lecture 19: Virtual Memory: Concepts

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Chapter 8. Virtual Memory

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory. Bernhard Boser & Randy Katz

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CSE 120 Principles of Operating Systems

Processes and Virtual Memory Concepts

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Topics: Memory Management (SGG, Chapter 08) 8.1, 8.2, 8.3, 8.5, 8.6 CS 3733 Operating Systems

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

Virtual Memory. Stefanos Kaxiras. Credits: Some material and/or diagrams adapted from Hennessy & Patterson, Hill, online sources.

Virtual Memory: Concepts

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe

CSE 153 Design of Operating Systems

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

Virtual Memory, Address Translation

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Virtual Memory. Physical Addressing. Problem 2: Capacity. Problem 1: Memory Management 11/20/15

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Background. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

ECE 571 Advanced Microprocessor-Based Design Lecture 12

Virtual memory Paging

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

CS3350B Computer Architecture

Computer Architecture. Lecture 8: Virtual Memory

CS399 New Beginnings. Jonathan Walpole

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Key Point. What are Cache lines

CISC 360. Virtual Memory Dec. 4, 2008

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

VIRTUAL MEMORY II. Jo, Heeseung

EN1640: Design of Computing Systems Topic 06: Memory System

CS 153 Design of Operating Systems Winter 2016

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory

Chapter 8 Memory Management

CS 61C: Great Ideas in Computer Architecture. Virtual Memory III. Instructor: Dan Garcia

Virtual or Logical. Logical Addr. MMU (Memory Mgt. Unit) Physical. Addr. 1. (50 ns access)

Virtual Memory. CS 351: Systems Programming Michael Saelee

Virtual to physical address translation

Handout 4 Memory Hierarchy

ECE/CS 250 Computer Architecture. Summer 2016

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

Transcription:

This Unit: Virtual Application OS Compiler Firmware I/O Digital Circuits Gates & Transistors hierarchy review DRAM technology A few more transistors Organization: two level addressing Building a memory system Bandwidth matching Error correction Organizing a memory system Virtual memory Address translation and page tables A virtual memory hierarchy Idea of treating memory like a cache Contents are a dynamic subset of program s address space Dynamic content management is transparent to program Actually predates caches (by a little) Original motivation: compatibility IBM System 370: a family of computers with one software suite + Same program could run on machines with different memory sizes Caching mechanism made it appear as if memory was 2 N bytes Regardless of how much memory there actually was Prior, programmers explicitly accounted for memory size Virtual memory Virtual: in effect, but not in actuality (i.e., appears to be, but isn t) 36 37 Virtual Program code heap stack Disk(swap) Programs use virtual addresses () 0 2 N 1 size also referred to as machine size E.g., Pentium4 is 32-bit, Itanium is 64-bit uses physical addresses () 0 2 M 1 (M<N, especially if N=64) 2 M is most physical memory machine supports at page granularity (VP PP) By system Mapping need not preserve contiguity VP need not be mapped to any PP Unmapped VPs live on disk (swap) Other Uses of Virtual Virtual memory is quite useful Automatic, transparent memory management just one use Functionality problems are solved by adding levels of indirection Example: multiprogramming Each process thinks it has 2 N bytes of address space Each thinks its stack starts at address 0xFFFFFFFF System maps VPs from different processes to different PPs + Prevents processes from reading/writing each other s memory Program1 Program2 38 39

Still More Uses of Virtual Inter-process communication Map VPs in different processes to same PPs Direct memory access I/O Think of I/O device as another process Will talk more about I/O in a few lectures Protection Piggy-back mechanism to implement page-level protection Map VP to PP and RWX protection bits Attempt to execute data, or attempt to write insn/read-only data? Exception OS terminates program Address Translation virtual address[31:0] physical address[27:0] VPN[31:16] translate PPN[27:16] POFS[15:0] don t touch POFS[15:0] mapping called address translation Split into virtual page number (VPN) and page offset (POFS) Translate VPN into physical page number (PPN) POFS is not translated why not? = [VPN, POFS] [PPN, POFS] Example above 64KB pages 16-bit POFS 32-bit machine 32-bit 16-bit VPN (16 = 32 16) Maximum 256MB memory 28-bit 12-bit PPN 40 41 Mechanics of Address Translation How are addresses translated? In software (now) but with hardware acceleration (a little later) Each process is allocated a page table (PT) Maps VPs to PPs or to disk (swap) addresses PT VP entries empty if page never referenced Translation is table lookup struct { union { int ppn, disk_block; } int is_valid, is_dirty; } PTE; struct PTE pt[num_virtual_ges]; int translate(int vpn) { if (pt[vpn].is_valid) return pt[vpn].ppn; } vpn Disk(swap) 42 Page Table Size How big is a page table on the following machine? 4B page table entries (PTEs) 32-bit machine 4KB pages Solution 32-bit machine 32-bit 4GB virtual memory 4GB virtual memory / 4KB page size 1M VPs 1M VPs * 4B PTE 4MB page table How big would the page table be with 64KB pages? How big would it be for a 64-bit machine? Page tables can get enormous There are ways of making them smaller 43

Multi-Level Page Table One way: multi-level page tables Tree of page tables Lowest-level tables hold PTEs Upper-level tables hold pointers to lower-level tables Different parts of VPN used to index different levels Example: two-level page table for machine on last slide Compute number of pages needed for lowest-level (PTEs) 4KB pages / 4B PTEs 1K PTEs fit on a single page 1M PTEs / (1K PTEs/page) 1K pages to hold PTEs Compute number of pages needed for upper-level (pointers) 1K lowest-level pages 1K pointers 1K pointers * 32-bit 4KB 1 upper level page Multi-Level Page Table 20-bit VPN Upper 10 bits index 1st-level table Lower 10 bits index 2nd-level table struct { union { int ppn, disk_block; } int is_valid, is_dirty; } PTE; struct { struct PTE ptes[1024]; } PT; struct PT *pt[1024]; VPN[19:10] pt root int translate(int vpn) { struct PT *l2pt = pt[vpn>>10]; if (l2pt && l2pt->ptes[vpn&1023].is_valid) return l2pt->ptes[vpn&1023].ppn; } VPN[9:0] 1st-level pointers 2nd-level PTEs 44 45 Multi-Level Page Table Have we saved any space? Isn t total size of 2nd level PTE pages same as singlelevel table (i.e., 4MB)? Yes, but Large virtual address regions unused Corresponding 2nd-level pages need not exist Corresponding 1st-level pointers are null Example: 2MB code, 64KB stack, 16MB heap Each 2nd-level page maps 4MB of virtual addresses 1 page for code, 1 for stack, 4 for heap, (+1 1st-level) 7 total pages for PT = 28KB (<< 4MB) Address Translation Mechanics The six questions What? address translation Why? compatibility, multi-programming, protection How? page table Who performs it? When? Where does page table reside? Option I: process (program) translates its own addresses Page table resides in process visible virtual address space Bad idea: implies that program (and programmer) must know about physical addresses Isn t that what virtual memory is designed to avoid? can forge physical addresses and mess with other programs Translation on miss or always? How would program know? 46 47

Who? Where? When? Take II Translation Buffer Option II: operating system (OS) translates for process Page table resides in OS virtual address space + User-level processes cannot view/modify their own tables + User-level processes need not know about physical addresses Translation on miss Otherwise, OS SYSCALL before any fetch, load, or store miss: interrupt transfers control to OS handler Handler translates by accessing process s page table Accesses memory using Returns to user process when fill completes Still slow: added interrupt handler and PT lookup to memory access What if PT lookup itself requires memory access? Head spinning I$ D$ Functionality problem? Add indirection! Performance problem? Add cache! Address translation too slow? Cache translations in translation buffer () Small cache: 16 64 entries, often fully assoc + Exploits temporal locality in PT accesses + OS handler only on miss tag VPN VPN VPN data PPN PPN PPN 48 49 Misses Nested Misses miss: requested PTE not in, but in PT Two ways of handling 1) OS routine: reads PT, loads entry into (e.g., Alpha) Privileged instructions in ISA for accessing directly Latency: one or two memory accesses + OS call 2) Hardware FSM: does same thing (e.g., IA-32) Store PT root pointer in hardware register Make PT root and 1st-level table pointers physical addresses So FSM doesn t have to translate them + Latency: saves cost of OS call Nested miss: when OS handler itself has a miss miss on handler instructions miss on page table s Not a problem for hardware FSM: no instructions, s in page table Handling is tricky for SW handler, but possible First, save current miss info before accessing page table So that nested miss info doesn t overwrite it Second, lock nested miss entries into Prevent conflicts that result in infinite loop Another good reason to have a highly-associative 50 51

Page Faults Virtual Caches Page fault: PTE not in or in PT Page is simply not in memory Starts out as a miss, detected by OS handler/hardware FSM OS routine OS software chooses a physical page to replace Working set : more refined software version of LRU Tries to see which pages are actively being used Balances needs of all current running applications If dirty, write to disk (like dirty cache block with writeback $) Read missing page from disk (done by OS) Takes so long (10ms), OS schedules another task Treat like a normal miss from here I$ D$ hierarchy so far: virtual caches Indexed and tagged by s Translate to s only to access memory + Fast: avoids translation latency in common case What to do on process switches? Flush caches? Slow Add process IDs to cache tags Does inter-process communication work? Aliasing: multiple s map to same How are multiple cache copies kept in sync? Also a problem for I/O (later in course) Disallow caching of shared memory? Slow 52 53 Physical Caches Virtual Physical Caches I$ D$ Alternatively: physical caches Indexed and tagged by s Translate to at the outset + No need to flush caches on process switches Processes do not share s + Cached inter-process communication works Single copy indexed by Slow: adds 1 cycle to t hit Compromise: virtual-physical caches TLB I$ D$ TLB Indexed by s Tagged by s Cache access and address translation in parallel + No context-switching/aliasing problems + Fast: no additional t hit cycles A that acts in parallel with a cache is a TLB Translation Lookaside Buffer Common organization in processors today 54 55

Cache/TLB Access Cache Size And Page Size Two ways to look at Cache: TAG+IDX+OFS TLB: VPN+POFS Can have parallel cache & TLB If address translation doesn t change IDX VPN/IDX don t overlap TLB hit/miss cache hit/miss [31:12] VPN [31:16] TLB cache 0 1 2 1022 1023 [11:2] 1:0 << POFS[15:0] [31:12] IDX[11:2] VPN [31:16] [15:0] Relationship between page size and L1 I$(D$) size Forced by non-overlap between VPN and IDX portions of Which is required for TLB access I$(D$) size / associativity page size Big caches must be set associative Big cache more index bits (fewer tag bits) More set associative fewer index bits (more tag bits) Systems are moving towards bigger (64KB) pages To amortize disk latency To accommodate bigger caches 1:0 address data 56 57 TLB Organization Like caches: TLBs also have ABCs What does it mean for a TLB to have a block size of two? Two consecutive VPs share a single tag Rule of thumb: TLB should cover contents In other words: #PTEs * page size size Why? Think about this Flavors of Virtual Virtual memory almost ubiquitous today Certainly in general-purpose (in a computer) processors But even some embedded (in non-computer) processors support it Several forms of virtual memory Paging (aka flat memory): equal sized translation blocks Most systems do this Segmentation: variable sized (overlapping?) translation blocks IA32 uses this Makes life very difficult Paged segments: don t ask 58 59

Summary DRAM Two-level addressing Refresh, access time, cycle time Building a memory system DRAM/bus bandwidth matching organization Virtual memory Page tables and address translation Page faults and handling Virtual, physical, and virtual-physical caches and TLBs Next part of course: I/O 60