- Established working set model - Led directly to virtual memory. Protection

Similar documents
Protection. - Programmers typically assume machine has enough memory - Sum of sizes of all processes often greater than physical memory 1 / 36

Administrivia. Lab 1 due Friday 12pm. We give will give short extensions to groups that run into trouble. But us:

Virtual memory Paging

Administrivia. - If you didn t get Second test of class mailing list, contact cs240c-staff. Clarification on double counting policy

CS399 New Beginnings. Jonathan Walpole

Multiprogramming on physical memory

Simple idea 1: load-time linking. Our main questions. Some terminology. Simple idea 2: base + bound register. Protection mechanics.

3.6. PAGING (VIRTUAL MEMORY) OVERVIEW

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

15 Sharing Main Memory Segmentation and Paging

16 Sharing Main Memory Segmentation and Paging

How to create a process? What does process look like?

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

Review: Hardware user/kernel boundary

12: Memory Management

Address spaces and memory management

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

CSE 120 Principles of Operating Systems

CS162 Operating Systems and Systems Programming Lecture 12. Address Translation. Page 1

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

238P: Operating Systems. Lecture 5: Address translation. Anton Burtsev January, 2018

CMSC 313 Lecture 14 Reminder: Midterm Exam next Tues (10/26) Project 4 Questions Virtual Memory on Linux/Pentium platform

Fall 2017 :: CSE 306. Introduction to. Virtual Memory. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Virtual Memory. Virtual Memory

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

Computer Architecture Lecture 13: Virtual Memory II

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CSE 560 Computer Systems Architecture

ECE 571 Advanced Microprocessor-Based Design Lecture 12

virtual memory Page 1 CSE 361S Disk Disk

CS 318 Principles of Operating Systems

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Midterm 1. Administrivia. Virtualizing Resources. CS162 Operating Systems and Systems Programming Lecture 12. Address Translation

Virtual Memory Oct. 29, 2002

COS 318: Operating Systems. Virtual Memory and Address Translation

Chapter 8 Memory Management

CSE 120 Principles of Operating Systems Spring 2017

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

Memory Management Topics. CS 537 Lecture 11 Memory. Virtualizing Resources

Virtual Memory. CS 351: Systems Programming Michael Saelee

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

UC Berkeley CS61C : Machine Structures

Operating Systems Lecture 6: Memory Management II

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

CS450/550 Operating Systems

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Lecture 13: Address Translation

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Topics: Memory Management (SGG, Chapter 08) 8.1, 8.2, 8.3, 8.5, 8.6 CS 3733 Operating Systems

Lec 22: Interrupts. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 24

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements

Memory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

CISC 360. Virtual Memory Dec. 4, 2008

Recap: Memory Management

@2010 Badri Computer Architecture Assembly II. Virtual Memory. Topics (Chapter 9) Motivations for VM Address translation

CS 550 Operating Systems Spring Memory Management: Paging

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

Basic Memory Management

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Protection and System Calls. Otto J. Anshus

Modeling Page Replacement: Stack Algorithms. Design Issues for Paging Systems

Chapter 8: Memory-Management Strategies

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

1. Creates the illusion of an address space much larger than the physical memory

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

Virtual Memory. 1 Administrivia. Tom Kelliher, CS 240. May. 1, Announcements. Homework, toolboxes due Friday. Assignment.

Classifying Information Stored in Memory! Memory Management in a Uniprogrammed System! Segments of a Process! Processing a User Program!

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

Page 1. Goals for Today" Important Aspects of Memory Multiplexing" Virtualizing Resources" CS162 Operating Systems and Systems Programming Lecture 9

Chapter 8: Main Memory

Course Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems

Memory Management. Dr. Yingwu Zhu

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019

Memory and multiprogramming

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018

! What is main memory? ! What is static and dynamic allocation? ! What is segmentation? Maria Hybinette, UGA. High Address (0x7fffffff) !

ECE 598 Advanced Operating Systems Lecture 14

Virtual Memory. CS 3410 Computer System Organization & Programming

Chapter 4 Memory Management. Memory Management

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

18-447: Computer Architecture Lecture 18: Virtual Memory III. Yoongu Kim Carnegie Mellon University Spring 2013, 3/1

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

143A: Principles of Operating Systems. Lecture 6: Address translation. Anton Burtsev January, 2017

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 13: Address Translation

Memory Management. Chapter 4 Memory Management. Multiprogramming with Fixed Partitions. Ideally programmers want memory that is.

Transcription:

Administrivia Virtual Lab 1 due Friday 12pm (noon) We give will give short extensions to groups that run into trouble But email us: - How much is done and left? - How much longer do you need? Attend section Friday at 12:30pm to learn about lab 2 Came out of work in late 1960s by eter Denning (lower right) - Established working set model - Led directly to virtual 1 / 37 2 / 37 Want processes to co-exist Issues in sharing 0x9000 OS 0x7000 bochs/pintos emacs 0x0000 Consider multiprogramming on - What happens if pintos needs to expand? - If emacs needs more than is on the machine? - If pintos has an error and writes to address 0x7100? - When does have to know it will run at? - What if emacs isn t using its? rotection - A bug in one process can corrupt in another - Must somehow prevent process A from trashing B s - Also prevent A from even observing B s (ssh-agent) Transparency - A process shouldn t require particular bits - Yes processes often require large amounts of contiguous (for stack, large data structures, etc) Resource exhaustion - rogrammers typically assume machine has enough - Sum of sizes of all processes often greater than 3 / 37 4 / 37 Virtual goals Virtual goals No: to fault handler Is address legal? No: to fault handler Is address legal? load virtual address 0x30408 MMU Yes: phys addr 0x92408 load virtual address 0x30408 MMU Yes: phys addr 0x92408 Give each program its own virtual address space - At runtime, Memory-Management Unit relocates each load/store - Application doesn t see addresses Also enforce protection - revent one app from messing with another s And allow programs to see more than exists - Somehow relocate some accesses to disk Give each program its own virtual address space - At runtime, Memory-Management Unit relocates each load/store - Application doesn t see addresses Also enforce protection - revent one app from messing with another s And allow programs to see more than exists - Somehow relocate some accesses to disk 5 / 37 5 / 37

Virtual advantages Idea 1: load-time linking Can re-locate program while running - Run partially in, partially on disk Most of a process s may be idle (80/20 rule) idle - Write idle parts to disk until needed - Let other processes use of idle part idle emacs - Like CU virtualization: when process not using CU, switch (Not using a region? switch it to another process) Challenge: VM = extra layer, could be slow static aout call 0x5200 Linker patches addresses of symbols like printf Idea: link when process executed, not at compile time - Determine where process will reside in - Adjust all references within program (using addition) roblems? 6 / 37 7 / 37 Idea 1: load-time linking Idea 2: base + bound register static aout call 0x5200 Linker patches addresses of symbols like printf Idea: link when process executed, not at compile time - Determine where process will reside in - Adjust all references within program (using addition) roblems? - How to enforce protection? - How to move once already in? (consider data pointers) - What if no contiguous free region fits program? Idea 2: base + bound register 7 / 37 static aout Two special privileged registers: base and bound On each load/store/jump: - hysical address = virtual address + base - Check 0 virtual address < bound, else trap to How to move process in? What happens on context switch? Idea 2: base + bound register 8 / 37 static aout static aout Two special privileged registers: base and bound On each load/store/jump: - hysical address = virtual address + base - Check 0 virtual address < bound, else trap to How to move process in? - Change base register What happens on context switch? 8 / 37 Two special privileged registers: base and bound On each load/store/jump: - hysical address = virtual address + base - Check 0 virtual address < bound, else trap to How to move process in? - Change base register What happens on context switch? - OS must re-load base and bound register 8 / 37

Definitions Definitions rograms load/store to virtual addresses Actual uses addresses VM Hardware is Memory Management Unit (MMU) CU virtual addresses MMU addresses rograms load/store to virtual addresses Actual uses addresses VM Hardware is Memory Management Unit (MMU) CU virtual addresses MMU addresses - Usually part of CU - Configured through privileged instructions (eg, load bound reg) - Translates from virtual to addresses - Gives per-process view of called address space - Usually part of CU - Configured through privileged instructions (eg, load bound reg) - Translates from virtual to addresses - Gives per-process view of called address space 9 / 37 9 / 37 Base+bound trade-offs Base+bound trade-offs Advantages - Cheap in terms of hardware: only two registers - Cheap in terms of cycles: do add and compare in parallel - Examples: Cray-1 used this scheme Advantages - Cheap in terms of hardware: only two registers - Cheap in terms of cycles: do add and compare in parallel - Examples: Cray-1 used this scheme Disadvantages Disadvantages - Growing a process is expensive or impossible - No way to share code or data (Eg, two copies of bochs, both running pintos) One solution: Multiple segments - Eg, separate code, stack, data segments - ossibly multiple data segments free space pintos2 pintos1 10 / 37 10 / 37 Segmentation Segmentation mechanics text r/o data stack Let processes have many base/bound regs - Address space built from many segments - Can share/protect at segment granularity Must specify segment as part of virtual address Each process has a segment table Each VA indicates a segment and offset: - Top bits of addr select segment, low bits select offset (D-10) - Or segment selected by instruction or operand (means you need wider far pointers to specify segment) 11 / 37 12 / 37

Segmentation example Segmentation trade-offs virtual 0x2000 0x4700 Advantages - Multiple segments per process - Allows sharing! (how?) - Don t need entire process in 0x1500 0x0700 0x0000 2-bit segment number (1st digit), 12 bit offset (last 3) - Where is 0x0240? 0x1108? 0x265c? 0x3002? 0x1600? 0x500 0x0000 Disadvantages - Requires translation hardware, which could limit performance - Segments not completely transparent to program (eg, default segment faster or uses shorter instruction) - n byte segment needs n contiguous bytes of - Makes fragmentation a real problem 13 / 37 14 / 37 Fragmentation Alternatives to hardware MMU Fragmentation = Inability to use free Over time: - Variable-sized pieces = many small holes (external fragmentation) - Fixed-sized pieces = no external holes, but force internal waste (internal fragmentation) Language-level protection (Java) - Single address space for different modules - Language enforces isolation - Singularity OS does this [Hunt] Software fault isolation - Instrument compiler output - Checks before every store operation prevents modules from trashing each other - Google Native Client does this 15 / 37 16 / 37 aging aging trade-offs Divide up into small pages Map virtual pages to pages - Each process has separate mapping Allow OS to gain control on certain operations - Read-only pages trap to OS on write - Invalid pages trap to OS on read or write - OS can change mapping and resume application Other features sometimes found: - Hardware can set accessed and dirty bits - Control page execute permission separately from read/write - Control caching or consistency of page Eliminates external fragmentation Simplifies allocation, free, and backing storage (swap) Average internal fragmentation of 5 pages per segment 17 / 37 18 / 37

Simplified allocation aging data structures emacs Disk ages are fixed size, eg, 4K - Least significant 12 (log 2 4K) bits of address are page offset - Most significant bits are page number Each process has a page table - Maps virtual page numbers (VNs) to page numbers (Ns) - Also includes bits for protection, validity, etc On access: Translate VN to N, then add offset Allocate any page to any process Can store idle virtual pages on disk 19 / 37 20 / 37 Example: aging on D-11 x86 aging aging enabled by bits in a control register (%cr0) - Only privileged OS code can manipulate control registers 64K virtual, 8K pages - Separate address space for instructions & data - Ie, can t read your own instructions with a load Entire page table stored in registers - 8 Instruction page translation registers - 8 Data page translations Normally 4KB pages %cr3: points to 4KB page directory - See pagedir_activate in intos age directory: 1024 DEs (page directory entries) - Each contains address of a page table age table: 1024 TEs (page table entries) Swap 16 machine registers on each context switch - Each contains address of virtual 4K page - age table covers 4 MB of Virtual mem See old intel manual for simplest explanation - Also volume 2 of AMD64 Architecture docs - Also volume 3A of latest entium Manual 21 / 37 22 / 37 x86 page translation x86 page directory entry Linear Address 31 22 21 12 11 0 Directory Table Offset 12 4 KByte age 10 age Table 10 hysical Address age Directory age Table Entry 20 Directory Entry 32* 1024 DE 1024 TE = 2 20 ages CR3 (DBR) *32 bits aligned onto a 4 KByte boundary 23 / 37 3 1 a g e D i r e c t o r y E n t r y (4 K B y t e a g e Ta b l e ) a g e Ta b le B a s e A d d re ss A v a ila b le fo r s y s te m p ro g ra m m e r s u s e G lo b a l p a g e (Ig n o re d ) a g e s iz e (0 in d ic a te s 4 K B y te s ) R e s e rv e d (s e t to 0 ) A c c e s s e d C a c h e d is a b le d W rite th ro u g h U s e r/s u p e rv is o r R e a d /W rite re s e n t 12 11 9 8 7 6 5 4 3 2 1 0 A v a il G S U R 0 A C W / / D T S W 24 / 37

x86 page table entry x86 hardware segmentation 31 age Table Entry (4 KByte age) age Base Address Available for system programmer s use Global age age Table Attribute Index Dirty Accessed Cache Disabled Write Through User/Supervisor Read/Write resent 12 11 9 8 7 6 5 4 3 2 1 0 Avail G A T D A C D W T U / S R / W x86 architecture also supports segmentation - Segment register base + pointer val = linear address - age translation happens on linear addresses Two levels of protection and translation check - Segmentation model has four privilege levels (CL 0 3) - aging only two, so 0 2 =, 3 = user Why do you want both paging and segmentation? 25 / 37 26 / 37 x86 hardware segmentation x86 architecture also supports segmentation - Segment register base + pointer val = linear address - age translation happens on linear addresses Two levels of protection and translation check - Segmentation model has four privilege levels (CL 0 3) - aging only two, so 0 2 =, 3 = user Why do you want both paging and segmentation? Short answer: You don t just adds overhead - Most OSes use flat mode set base = 0, bounds = 0xffffffff in all segment registers, then forget about it - x86-64 architecture removes much segmentation support Long answer: Has some fringe/incidental uses - VMware runs guest OS in CL 1 to trap stack faults - OpenBSD used CS limit for W X when no TE NX bit Making paging fast x86 Ts require 3 references per load/store - Look up page table address in page directory - Look up page number (N) in page table - Actually access page corresponding to virtual address For speed, CU caches recently used translations - Called a translation lookaside buffer or TLB - Typical: 64-2K entries, 4-way to fully associative, 95% hit rate - Each TLB entry maps a VN N + protection information On each reference - Check TLB, if entry present get address fast - If not, walk page tables, insert in TLB for next time (Must evict some entry) 26 / 37 27 / 37 TLB details x86 aging Extensions TLB operates at CU pipeline speed = small, fast Complication: what to do when switch address space? - Flush TLB on context switch (eg, old x86) - Tag each entry with associated process s ID (eg, MIS) In general, OS must manually keep TLB valid Eg, x86 invlpg instruction - Invalidates a page translation in TLB - Note: very expensive instruction (100 200 cycles) - Must execute after changing a possibly used page table entry - Otherwise, hardware will miss page table change More Complex on a multiprocessor (TLB shootdown) - Requires sending an interprocessor interrupt (II) - Remote processor must execute invlpg instruction SE: age size extensions - Setting bit 7 in DE makes a 4MB translation (no T) AE age address extensions - Newer 64-bit TE format allows 36 bits of address - age tables, directories have only 512 entries - Use 4-entry age-directory-ointer Table to regain 2 lost bits - DE bit 7 allows 2MB translation Long mode AE - In Long mode, pointers are 64-bits - Extends AE to map 48 bits of virtual address (next slide) - Why are aren t all 64 bits of VA usable? 28 / 37 29 / 37

x86 long mode paging Virtual Address 63 48 47 39 38 30 29 age Map Sign Extend age Directory age Directory Level 4 offset ointer Offset Offset (ML4) 9 21 20 age Table Offset age age Map Directory age Level 4 ointer Directory age Table Table Table Table ML4E 51 52 9 DE age Map L4 Base Addr 52 12 9 DE 52 9 TE 12 11 52 CR3 hysical age Offset 12 0 4 Kbyte hysical age hysical Address 30 / 37 In its own address space? Where does the OS live? - Can t do this on most hardware (eg, syscall instruction won t switch address spaces) - Also would make it harder to parse syscall arguments passed as pointers So in the same address space as process - Use protection bits to prohibit user code from writing Typically all text, most data at same VA in every address space - On x86, must manually set up page tables for this - Usually just map in contiguous virtual when boot loader puts into contiguous - Some hardware puts (-only) somewhere in virtual address space 31 / 37 intos layout Very different MMU: MIS Kernel/ seudo- User stack BSS / Heap Data segment Code segment Invalid virtual addresses 0xffffffff 0xc0000000 (HYS_BASE) 0x08048000 0x00000000 Hardware checks TLB on application load/store - References to addresses not in TLB trap to Each TLB entry has the following fields: Virtual page, id, age frame, NC, D, V, Global Kernel itself unpaged - All of contiguously mapped in high VM (hardwired in CU, not just by convention as with intos) - Kernel uses these pseudo- addresses User TLB fault hander very efficient - Two hardware registers reserved for it - utlb miss handler can itself fault allow paged page tables OS is free to choose page table format! 32 / 37 33 / 37 DEC Alpha MMU AL code interface details Firmware managed TLB - Like MIS, TLB misses handled by software - Unlike MIS, TLB miss routines ship with machine in ROM (but copied to main on boot so can be overwritten) - Firmware known as AL code (privileged architecture library) Hardware capabilities - 8KB, 64KB, 512KB, 4MB pages all available - TLB supports 128 instruction/128 data entries of any size Various other events vector directly to AL code - call_pal instruction, TLB miss/fault, F disabled AL code runs in special privileged processor mode - Interrupts always disabled - Have access to special instructions and registers Examples of Digital Unix ALcode entry functions - callsys/retsys - make, return from system call - swpctx - change address spaces - wrvptptr - write virtual page table pointer - tbi - TBL invalidate Some fields in ALcode page table entries - GH - 2-bit granularity hint 2 N pages have same translation - ASM - address space match mapping applies in all processes 34 / 37 35 / 37

Example: aging to disk aging in day-to-day use needs a new page of OS re-claims an idle page from emacs If page is clean (ie, also stored on disk): - Eg, page of text from emacs binary on disk - Can always re-read same page from binary - So okay to discard contents now & give page to If page is dirty (meaning is only copy) - Must write page to disk first before giving to Either way: - Mark page invalid in emacs - emacs will fault on next access to virtual page - On fault, OS reads page data back from disk into new page, maps new page into emacs, resumes executing Demand paging Growing the stack BSS page allocation Shared text Shared libraries Shared Copy-on-write (fork, mmap, etc) Q: Which pages should have global bit set on x86? 36 / 37 37 / 37