Computer Architecture Lecture 13: Virtual Memory II

Similar documents
18-447: Computer Architecture Lecture 18: Virtual Memory III. Yoongu Kim Carnegie Mellon University Spring 2013, 3/1

Virtual Memory. Samira Khan Apr 27, 2017

18-447: Computer Architecture Lecture 16: Virtual Memory

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Virtual memory Paging

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

Virtual Memory. Virtual Memory

Virtual Memory: From Address Translation to Demand Paging

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 24

238P: Operating Systems. Lecture 5: Address translation. Anton Burtsev January, 2018

Processes and Virtual Memory Concepts

143A: Principles of Operating Systems. Lecture 6: Address translation. Anton Burtsev January, 2017

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

CIS Operating Systems Memory Management Address Translation for Paging. Professor Qiang Zeng Spring 2018

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

Virtual Memory. CS 351: Systems Programming Michael Saelee

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

CSE 351. Virtual Memory

Computer Science 146. Computer Architecture

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

P6/Linux Memory System Nov 11, 2009"

Virtual Memory Oct. 29, 2002

Administrivia. Lab 1 due Friday 12pm. We give will give short extensions to groups that run into trouble. But us:

1. Creates the illusion of an address space much larger than the physical memory

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Computer Structure. X86 Virtual Memory and TLB

COS 318: Operating Systems. Virtual Memory and Address Translation

John Wawrzynek & Nick Weaver

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

Lecture 17: Address Translation. James C. Hoe Department of ECE Carnegie Mellon University

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CSE 153 Design of Operating Systems

CS 537 Lecture 6 Fast Translation - TLBs

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe

CS 153 Design of Operating Systems Winter 2016

Computer Systems. Virtual Memory. Han, Hwansoo

CIS Operating Systems Memory Management Address Translation. Professor Qiang Zeng Fall 2017

Multi-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis

Virtual Memory II CSE 351 Spring

Memory System Case Studies Oct. 13, 2008

Chapter 8. Virtual Memory

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

CSE 120 Principles of Operating Systems

Lecture 19: Virtual Memory: Concepts

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches

Virtual Memory 2. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.4

New-School Machine Structures. Overarching Theme for Today. Agenda. Review: Memory Management. The Problem 8/1/2011

Virtual Memory. Motivation:

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

Virtual Memory. CS 3410 Computer System Organization & Programming

virtual memory Page 1 CSE 361S Disk Disk

CSE 560 Computer Systems Architecture

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Computer Architecture. Lecture 8: Virtual Memory

Pentium/Linux Memory System March 17, 2005

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Portland State University ECE 587/687. Virtual Memory and Virtualization

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

CS 318 Principles of Operating Systems

CISC 360. Virtual Memory Dec. 4, 2008

14 May 2012 Virtual Memory. Definition: A process is an instance of a running program

CSE 120 Principles of Operating Systems Spring 2017

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

A Typical Memory Hierarchy

ECE232: Hardware Organization and Design

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

A Typical Memory Hierarchy

CS162 - Operating Systems and Systems Programming. Address Translation => Paging"

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

COSC3330 Computer Architecture Lecture 20. Virtual Memory

CS152 Computer Architecture and Engineering. Lecture 15 Virtual Memory Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory. Bernhard Boser & Randy Katz

Transcription:

18-447 Computer Architecture Lecture 13: Virtual Memory II Lecturer: Rachata Ausavarungnirun Carnegie Mellon University Spring 2014, 2/17/2014 (with material from Onur Mutlu, Justin Meza and Yoongu Kim)

Announcement Lab 2 grades and feedback available tonight Lab 3 due this Friday (21 st Feb.) HW 2 grades and feedback available tonight Midterm 1 in two weeks (5 th Mar.) Paper summary during this week recitations 2

Two problems with Page Table Problem #1: Page table is too large Page table has 1M entries Each entry is 4B (because 4B 20-bit PPN) Page table = 4MB (!!) very expensive in the 80s Solution: Multi-level page table 3

Two problems with Page Table Problem #1: Page table is too large Page table has 1M entries Each entry is 4B (because 4B 20-bit PPN) Page table = 4MB (!!) very expensive in the 80s Problem #2: Page table is in memory Before every memory access, always fetch the PTE from the slow memory? Large performance penalty 4

Translation Lookaside Buffer (TLB) A hardware structure where PTEs are cached Q: How about PDEs? Should they be cached? Whenever a virtual address needs to be translated, the TLB is first searched: hit vs. miss Example: 80386 32 entries in the TLB TLB entry: tag + data Tag: 20-bit VPN + 4-bit flag (valid, dirty, R/W, U/S) Data: 20-bit PPN Q: Why is the tag needed? 5

Context Switches Assume that Process X is running Process X s VPN 5 is mapped to PPN 100 The TLB caches this mapping VPN 5 PPN 100 Now assume a context switch to Process Y Process Y s VPN 5 is mapped to PPN 200 When Process Y tries to access VPN 5, it searches the TLB Process Y finds an entry whose tag is 5 TLB hit! The PPN must be 100! Are you sure?

Context Switches (cont d) Approach #1. Flush the TLB Whenever there is a context switch, flush the TLB All TLB entries are invalidated Example: 80836 Updating the value of CR3 signals a context switch This automatically triggers a TLB flush Approach #2. Associate TLB entries with processes All TLB entries have an extra field in the tag... That identifies the process to which it belongs Invalidate only the entries belonging to the old process Example: Modern x86, MIPS

Handling TLB Misses The TLB is small; it cannot hold all PTEs Some translations will inevitably miss in the TLB Must access memory to find the appropriate PTE Called walking the page directory/table Large performance penalty Who handles TLB misses? 1. Hardware-Managed TLB 2. Software-Managed TLB

Handling TLB Misses (cont d) Approach #1. Hardware-Managed (e.g., x86) The hardware does the page walk The hardware fetches the PTE and inserts it into the TLB If the TLB is full, the entry replaces another entry All of this is done transparently Approach #2. Software-Managed (e.g., MIPS) The hardware raises an exception The operating system does the page walk The operating system fetches the PTE The operating system inserts/evicts entries in the TLB

Handling TLB Misses (cont d) Hardware-Managed TLB Pro: No exceptions. Instruction just stalls Pro: Independent instructions may continue Pro: Small footprint (no extra instructions/data) Con: Page directory/table organization is etched in stone Software-Managed TLB Pro: The OS can design the page directory/table Pro: More advanced TLB replacement policy Con: Flushes pipeline Con: Performance overhead

Protection with Virtual Memory A normal user process should not be able to: Read/write another process memory Write into shared library data How does virtual memory help? Address space isolation Protection information in page table Efficient clearing of data on newly allocated pages 11

Protection: Leaked Information Example (with the virtual memory we ve discussed so far): Process A writes my password =... to virtual address 2 OS maps virtual address 2 to physical page 4 in page table Process A no longer needs virtual address 2 OS unmaps virtual address 2 from physical page 4 in page table Attack vector: Sneaky Process B continually allocates pages and searches for my password = <string> 12

Page-Level Access Control (Protection) Not every process is allowed to access every page E.g., may need supervisor level privilege to access system pages Idea: Store access control information on a page basis in the process s page table Enforce access control at the same time as translation Virtual memory system serves two functions today Address translation (for illusion of large physical memory) Access control (protection) 13

Page Table is Per Process Each process has its own virtual address space Full address space for each program Simplifies memory allocation, sharing, linking and loading. Virtual Address Space for Process 1: Virtual Address Space for Process 2: 0 N-1 0 VP 1 VP 2... VP 1 VP 2... Address Translation N-1 M-1 0 PP 2 PP 7 PP 10 Physical Address Space (DRAM) (e.g., read/only library code) 14

VM as a Tool for Memory Access Protection Extend Page Table Entries (PTEs) with permission bits Page fault handler checks these before remapping If violated, generate exception (Access Protection exception) Page Tables Memory Process i: VP 0: VP 1: Read? Write? Yes Yes No Yes Physical Addr PP 9 PP 4 PP 0 PP 2 VP 2: No No XXXXXXX PP 4 PP 6 Process j: VP 0: VP 1: VP 2: Read? Write? Yes Yes No Yes No No Physical Addr PP 6 PP 9 XXXXXXX PP 8 PP 10 PP 12 15

Privilege Levels in x86 16

x86: Privilege Level (Review) Four privilege levels in x86 (referred to as rings) Ring 0: Highest privilege (operating system) Ring 1: Not widely used Ring 2: Not widely used Ring 3: Lowest privilege (user applications) Current Privilege Level (CPL) determined by: Supervisor Address of the instruction that you are executing User Specifically, the Descriptor Privilege Level (DPL) of the code segment

x86: A Closer Look at the PDE/PTE PDE: Page Directory Entry (32 bits) PTE: Page Table Entry (32 bits) PDE &PT Flags PTE PPN Flags

Protection: PDE s Flags Protects all 1024 pages in a page table

Protection: PTE s Flags Protects one page at a time

Protection: PDE + PTE =???

Protection: Segmentation + Paging Paging provides protection Flags in the PDE/PTE (x86) Read/Write User/Supervisor Executable (x86-64) Segmentation also provides protection Flags in the Segment Descriptor (x86) Read/Write Descriptor Privilege Level Executable

Aside: Protection w/o Virtual Memory Question: Do we need virtual memory for protection Answer: No Other ways of providing memory protection Base and bound registers Segmentation None of these are as elegant as page-based access control They run into complexities as we need more protection capabilites Virtual memory integrates 23

Overview of Segmentation Divide the physical address space into segments The segments may overlap Virtual Addr. 0x2345 Physical Addr. + 0xA345 Base:0x8000 segment physical memory segment 0xFFFF Base:0x0000 0x0000

Segmentation in Intel 8086 Intel 8086 (Late 70s) 16-bit processor 4 segment registers that store the base address

Intel 8086: Specifying a Segment There can be many segments 1MB?? But only 4 of them are addressable at once Which 4 depends on the 4 segment registers The programmer sets the segment register value Each segment is 64KB in size Because 8086 is 16-bit

Intel 8086: Translation 8086 is a 16-bit processor... How can it address up to 0xFFFFF (1MB)? Segment Register Virtual Addr.

Intel 8086: Which Segment Register? Q: For a memory access, how does the machine know which of the 4 segment register to use? A: Depends on the type of memory access Can be overridden: mov %AX,(%ES:0x1234) x86 Instruction

Segmentation in Intel 80286 Intel 80286 (Early 80s) Still a 16-bit processor Still has 4 segment registers that... stores the index into a table of base addresses not the base address itself 63 0 15 0 Segment Register (CS) Segment Register (DS) Segment Register (SS) Segment Register (ES) Segment Selectors Segment Descriptor N-1 Segment Descriptor 2 Segment Descriptor 1 Segment Descriptor 0 Segment Descriptor Table

Intel 80286: Segment Descriptor A segment descriptor describes a segment: 1. BASE: Base address 2. LIMIT: The size of the segment 3. DPL: Descriptor Privilege Level (!!) 4. Etc. 63 0 Segment Descriptor

Issues with Segmentation Segmented addressing creates fragmentation problems: a system may have plenty of unallocated memory locations they are useless if they do not form a contiguous region of a sufficient size Page-based virtual memory solves these issues By ensuring the address space is divided into fixed size pages And virtual address space of each process is contiguous The key is the use of indirection to give each process the illusion of a contiguous address space 31

Page-based Address Space In a Paged Memory System: PA space is divided into fixed size segments (e.g., 4kbyte), more commonly known as page frames VA is interpreted as page number and page offset Page No. Page Offset page tables must be 1. privileged data structures and 2. private/unique to each process page table Frame no & okay? + PA 32

Fast Forward to Today (2014) Modern x86 Machines 32-bit x86: Segmentation is similar to 80286 64-bit x86: Segmentation is not supported per se Forces the BASE=0x0000000000000000 Forces the LIMIT=0xFFFFFFFFFFFFFFFF But DPL is still supported Side Note: Linux & 32-bit x86 Linux does not use segmentation per se For all segments, Linux sets BASE=0x00000000 For all segments, Linux sets LIMIT=0xFFFFFFFF Instead, Linux uses segments for privilege levels For segments used by the kernel, Linux sets DPL = 0 For segments used by the applications, Linux sets DPL = 3

Other Issues When do we do the address translation? Before or after accessing the L1 cache? In other words, is the cache virtually addressed or physically addressed? Virtual versus physical cache What are the issues with a virtually addressed cache? Synonym problem: Two different virtual addresses can map to the same physical address same physical address can be present in multiple locations in the cache can lead to inconsistency in data 34