ECE 571 Advanced Microprocessor-Based Design Lecture 12

Similar documents
ECE 598 Advanced Operating Systems Lecture 14

ECE 571 Advanced Microprocessor-Based Design Lecture 13

ECE 598 Advanced Operating Systems Lecture 14

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018

LECTURE 12. Virtual Memory

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2015

COSC3330 Computer Architecture Lecture 20. Virtual Memory

Virtual Memory, Address Translation

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

1. Creates the illusion of an address space much larger than the physical memory

ECE 571 Advanced Microprocessor-Based Design Lecture 10

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Virtual Memory, Address Translation

CS162 - Operating Systems and Systems Programming. Address Translation => Paging"

Memory Management. Dr. Yingwu Zhu

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Virtual Memory. Virtual Memory

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Multi-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Operating Systems. Paging... Memory Management 2 Overview. Lecture 6 Memory management 2. Paging (contd.)

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

Modern Virtual Memory Systems. Modern Virtual Memory Systems

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

Virtual Memory. Motivation:

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

ADDRESS TRANSLATION AND TLB

Virtual Memory: From Address Translation to Demand Paging

Virtual to physical address translation

Chapter 8 Memory Management

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

ADDRESS TRANSLATION AND TLB

Address spaces and memory management

ECE232: Hardware Organization and Design

CSE 120 Principles of Operating Systems

Main Memory: Address Translation

Virtual Memory. Today. Handling bigger address spaces Speeding translation

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Computer Science 146. Computer Architecture

EECS 482 Introduction to Operating Systems

18-447: Computer Architecture Lecture 18: Virtual Memory III. Yoongu Kim Carnegie Mellon University Spring 2013, 3/1

Chapter 8. Virtual Memory

PROCESS VIRTUAL MEMORY PART 2. CS124 Operating Systems Winter , Lecture 19

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Chapter 3 - Memory Management

Topics to be covered. EEC 581 Computer Architecture. Virtual Memory. Memory Hierarchy Design (II)

16 Sharing Main Memory Segmentation and Paging

ECE331: Hardware Organization and Design

Operating Systems. Operating Systems Sina Meraji U of T

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Memory Management Topics. CS 537 Lecture 11 Memory. Virtualizing Resources

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

virtual memory Page 1 CSE 361S Disk Disk

Virtual Memory 2. Today. Handling bigger address spaces Speeding translation

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

CS370 Operating Systems

Virtual Memory. CS 3410 Computer System Organization & Programming

Virtual Memory 2. To do. q Handling bigger address spaces q Speeding translation

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

This Unit: Main Memory. Virtual Memory. Virtual Memory. Other Uses of Virtual Memory

x86 Memory Protection and Translation

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

Memory Management. Goals of Memory Management. Mechanism. Policies

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Virtual Memory. Samira Khan Apr 27, 2017

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 13: Address Translation

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2

Virtual Memory II CSE 351 Spring

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

ECE 571 Advanced Microprocessor-Based Design Lecture 10

Virtual Memory 2. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.4

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

CSE 153 Design of Operating Systems

John Wawrzynek & Nick Weaver

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Outline. V Computer Systems Organization II (Honors) (Introductory Operating Systems) Advantages of Multi-level Page Tables

CS162 Operating Systems and Systems Programming Lecture 12. Address Translation. Page 1

Memory Management. To improve CPU utilization in a multiprogramming environment we need multiple programs in main memory at the same time.

Lecture 13: Address Translation

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

CS 134: Operating Systems

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Virtual memory Paging

15 Sharing Main Memory Segmentation and Paging

Recall from Tuesday. Our solution to fragmentation is to split up a process s address space into smaller chunks. Physical Memory OS.

Transcription:

ECE 571 Advanced Microprocessor-Based Design Lecture 12 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 March 2018

HW#6 will be posted Project will be coming up Announcements 1

Virtual Memory Original purpose was to give the illusion of more main memory than available, with disk as backing store. Give each process own linear view of memory. Demand paging (no swapping out whole processes). Execution of processes only partly in memory, effectively a cache. Memory protection Security 2

Diagram Virtual Process 1 Virtual Process 2 Kernel Kernel Stack Stack Heap BSS Data Text Physical RAM Heap BSS Data Text 3

Memory Management Unit Can run without MMU. There s even MMU-less Linux. How do you keep processes separate? Very carefully... 4

Page Lookup Simplest would just be a table, with virtual page as index and physical page as value. 5

Page Tables Collection of Page Table Entries (PTE) Some common components: ID of owner Virtual Page Number valid bit, location of page (memory, disk, etc) protection info (read only, etc) page is dirty, age (how recent updated, for LRU) 6

Hierarchical Page Tables With 4GB memory and 4kb pages, you have 1 Million pages per process. If each has 4-byte PTE then 4MB of page tables per-process. Too big. It is likely each process does not use all 4GB at once. (sparse) So put page tables in swappable virtual memory themselves! 4MB page table is 1024 pages which can be mapped in 1 4KB page. 7

Hierarchical Page Table Diagram Physical Memory Virtual Address 10bits 10bits 12bits 4MB Page Table 4kB page tables Page Table Base Address (Stored in a register) 8

Hierarchical Page Table Diagram 32-bit x86 chips have hardware 2-level page tables ARM 2-level page tables What do you do if you have more than 32-bits? 64-bit x86 has 4-level page tables (256TBv/64TBp) 44/40-bits? Push by Intel for 5-level tables (128PBv/4PBp) 57 bits? What happens when you have unused bits? People try to use them, causes problems later. AMD64 canonical 9

addresses. 10

Inverted Page Table How to handle larger 64-bit address spaces? Can add more levels of page tables (4? 5?) but that becomes very slow Can use hash to find page. Better best case performance, can perform poorly if hash algorithm has lots of aliasing. 11

Inverted Page Table Diagram Physical Memory Virtual Address Page Tables HASH hit alias re hash 12

Walking the Page Table Can be walked in Hardware or Software Hardware is more common Early RISC machines would do it in Software. Can be slow. Has complications: what if the page-walking code was swapped out? 13

TLB Translation Lookaside Buffer (Lookaside Buffer is an obsolete term meaning cache) Caches page tables Much faster than doing a page-table walk. Historically fully associative, recently multi-level multiway TLB shootdown when change a setting on a mapping 14

and TLB invalidated on all other processors 15

Flushing the TLB May need to do this on context switch if doesn t store ASID or ASIDs run out (intel only added ASID support recently) Sometimes called a TLB Shootdown Hurts performance as the TLB gradually refills Avoiding this is why the top part is mapped to kernel under Linux (security issue with Meltdown bug!) 16

What happens on a memory access Cache hit, generally not a problem, see later. To be in cache had to have gone through the whole VM process. Although some architectures do a lookup anyway in case permissions have changed. Cache miss, then send access out to memory If in TLB, not a problem, right page fetched from physical memory, TLB updated If not in TLB, then the page tables are walked 17

It no physical mapping in page table, then page fault happens 18

What happens on a page fault Walk the page table and see if the page is valid and there minor page is already in memory, just need to point a PTE at it. For example, shared memory, shared libraries, etc. major page needs to be created or brought in from disk. Demand paging. Needs to find room in physical memory. If no free space 19

available, needs to kick something out. Disk-backed (and not dirty) just discarded. Disk-backed and dirty, written back. Memory can be paged to disk. Eventually can OOM. Memory is then loaded, or zeroed, and PTE updated. Can it be shared? (zero page) invalid segfault 20

What happens on a fork? Do you actually copy all of memory? Why would that be bad? (slow, also often exec() right away) Page table marked read-only, then shared Only if writes happen, take page fault, then copy made Copy-on-write 21

Virtual Memory Cache Concerns 22

Cache Issues Page table Entries are cached too What happens if more memory can fit in the cache than can be covered by the TLB? If you have 128 TLB entries * 4kB you can cover 512kB If your cache is larger (say 1MB) then a simple walk through the cache will run out of TLB entries, so page lookups will happen (bringing page table data into cache) and so you do not get maximal usefulness from the cache 23

This has happened in various chips over the years 24

Physical Caches Virtual Offset TLB Physical Offset Tag Cache IDX Off 25

Physical Caches, PIPT Location in cache based on physical address Can be slower, as need TLB lookup for each cache access No need to flush cache on context switch (or ever, really) No need to do TLB lookup on writeback 26

Virtual Caches Virtual Offset Cache Tag IDX Off Writeback TLB Physical Offset 27

Virtual Caches Location in cache based on virtual address Faster, as no need to do TLB lookup before access Will have to use TLB on miss (for fill) or when writing back dirty addresses Cache might have extra bits to indicate permissions so TLB doesn t have to be checked on write Aliasing: Homonyms: Same virtual address (in multiple processes) map to different physical page Mist flush cache on context switch? 28

How to avoid flushing? Have a process-id (ASID). Can also implement sharing this way, by both processes mapping to same virt address. Having kernel addresses high also avoids aliasing Aliasing: Synonyms: Phys address has two virtual mappings Operating system might use page or cache coloring Operating system has to do more work. 29

VIPT Virtual Offset index tags Cache TLB Physical compare 30

Cache lookup and TLB lookup in parallel. Cache size + associativity must be less than page size. If properly sized, the index bits are the same for virt and physical. In this case no need to do TLB lookup on cache hit. If not sized, the extra index bits need to be stored in the cache so they can be passed along with the tag when doing a lookup 31

Combinations PIPT older systems. Slow, as must be translated (go through TLB) for every cache access (don t know index or tag until after lookup) VIVT fast. Do not need to consult TLB to find data in cache. VIPT ARM L1/L2. Faster, cache line can be looked up in parallel with TLB. Needs more tag bits. PIVT theoretically possible, but useless. As slow as 32

PIPT but aliasing like VIVT. 33

Dealing with Limitations? 34

Large Pages Another way to avoid problems with 64-bit address space Larger page size (64kB? 1MB? 2MB? 2GB?) Less granularity. Potentially waste space Fewer TLB entries needed to map large data structures Compromise: multiple page sizes. Complicate O/S and hardware. OS have to find free blocks of contiguous memory when allocating large page. 35

Transparent usage? Transparent Huge Pages? Alternative to making people using special interfaces to allocate. 36

Having Larger Physical than Virtual Address Space 32-bit processors cannot address more than 4GB x86 hit this problem a while ago, ARM just now Real solution is to move to 64-bit As a hack, can include extra bits in page tables, address more memory (though still limited to 4GB per-process) Linus Torvalds hates this. 37

Hit an upper limit around 16-32GB because entire low 4GB of kernel addressable memory fills with page tables 38

L1 (4-way associative) 64 4kB 32 2MB 4 1GB Haswell Virtual Memory L2 (1024 entry 8-way associative, combined 4kB and 2M) DCache 32kB/8-way so VIPT possible 39

Cortex A9 MMU Virtual Memory System Architecture version 7 (VMSAv7) page table entries that support 4KB, 64KB, 1MB, and 16MB global and address space ID (no more TLB flush on context switch) instruction micro-tlb (32 or 64 fully associative) 40

data micro-tlb (32 fully associative) Unified main TLB, 2-way, 2x64 (128 total) on pandaboard 4 lockable entries (why want to do that?) Supports hardware page table walks 41

Cortex A9 MMU Virtual Memory System Architecture version 7 (VMSAv7) Addresses can be 40bits virt / 32 physical First check FCSE linear translation of bottom 32MB to arbitrary block in physical memory (optional with VMSAv7) 42

Cortex A9 TLB micro-tlb. 1 cycle access. needs to be flushed if ASID changes fully-associative lockable 4 elements plus 2-way larger. varying cycles access 43

Cortex A9 TLB Measurement 10000000000 1000000000 Stalls 100000000 10000000 1000000 Dcache Stalls (r61) TLB stalls (r83) mtlb Stalls (r85) L1 Cache Size utlb (32) Coverage TLB (128) Coverage L2 Cache 100000 10000 16 32 64 128 256 512 Matrix size 44