Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

Similar documents
Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

CSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)

CSE 120 Principles of Operating Systems

Virtual Memory. CS 351: Systems Programming Michael Saelee

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

COS 318: Operating Systems. Virtual Memory and Address Translation

Virtual Memory. Virtual Memory

VIRTUAL MEMORY II. Jo, Heeseung


Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

CS 153 Design of Operating Systems Winter 2016

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

Virtualization and memory hierarchy

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Main Memory (Fig. 7.13) Main Memory

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

CSE 120 Principles of Operating Systems Spring 2017

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 2)

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

1. Memory technology & Hierarchy

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

CS 318 Principles of Operating Systems

Virtual Memory. CS 3410 Computer System Organization & Programming

Topic 18 (updated): Virtual Memory

Computer Architecture Lecture 13: Virtual Memory II

Topic 18: Virtual Memory

HY225 Lecture 12: DRAM and Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

Chapter 8 Memory Management

Virtual Memory. Samira Khan Apr 27, 2017

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

143A: Principles of Operating Systems. Lecture 6: Address translation. Anton Burtsev January, 2017

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Chapter 5 B. Large and Fast: Exploiting Memory Hierarchy

CS162 - Operating Systems and Systems Programming. Address Translation => Paging"

Virtual or Logical. Logical Addr. MMU (Memory Mgt. Unit) Physical. Addr. 1. (50 ns access)

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

Computer Architecture. R. Poss

virtual memory Page 1 CSE 361S Disk Disk

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

1. Creates the illusion of an address space much larger than the physical memory

Modeling Page Replacement: Stack Algorithms. Design Issues for Paging Systems

14 May 2012 Virtual Memory. Definition: A process is an instance of a running program

238P: Operating Systems. Lecture 5: Address translation. Anton Burtsev January, 2018

Multi-Process Systems: Memory (2) Memory & paging structures: free frames. Memory & paging structures. Physical memory

Virtualization. Pradipta De

Virtual to physical address translation

Virtual memory Paging

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Main Memory: Address Translation

Memory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

Processes and Virtual Memory Concepts

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Virtual Virtual Memory

Move back and forth between memory and disk. Memory Hierarchy. Two Classes. Don t

Memory Management Ch. 3

Lecture 7. Xen and the Art of Virtualization. Paul Braham, Boris Dragovic, Keir Fraser et al. 16 November, Advanced Operating Systems

CS450/550 Operating Systems

Translation Buffers (TLB s)

Computer Systems. Virtual Memory. Han, Hwansoo

Virtual Memory Oct. 29, 2002

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Virtual Memory 2. Today. Handling bigger address spaces Speeding translation

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Administrivia. Lab 1 due Friday 12pm. We give will give short extensions to groups that run into trouble. But us:

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Lecture 19: Virtual Memory: Concepts

Part I. X86 architecture overview. Secure Operating System Design and Implementation x86 architecture. x86 processor modes. X86 architecture overview

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

ECE331: Hardware Organization and Design

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

references Virtualization services Topics Virtualization

Virtual Memory 2. To do. q Handling bigger address spaces q Speeding translation

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CSE 153 Design of Operating Systems

Background. IBM sold expensive mainframes to large organizations. Monitor sits between one or more OSes and HW

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

CS 153 Design of Operating Systems Winter 2016

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Transcription:

Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory?

The Environment ---- Long Latency ---- Low Bandwidth Big blocks : spatial locality Big cache : lower miss rate Associative : lower miss rate Write back : less bandwidth Multiple levels : lower avg penalty

separate into two parts Expand tag storage: entry for every possible tag tag == block address, aka block number, is redundant, eliminate Block ==> Page Tag storage ==> Page Table Cache Data ==> Memory pages

I'v got page table issues --- Where are the page tables, physically? ===> memory? SRAM? Hardware? --- If in memory, how many memory accesses to read one data item (ignore cache)? --- If page tables are read/write ===> Can my program rewrite your page table (or my own, accidentally)? --- If page tables are not read/write, how do they get pointer values? ===> Need protection bits per page: Kernel Mode 0: R/W, User Mode 1: no R/W ===> Where do protection bits go? How are they accessed? --- It's nice to share memory, but why bother? ===> Principle of interleaving: long latency task? Go find other work to do. ===> OS has work to do, too. --- What about I/O? ===> Is that done using virtual addresses? Memory mapped I/O device registers? --- Speaking of I/O, what about long, slow I/O for disk blocks (pages)? DMA is a processor, just does what PROC would have done. PROC does other work (interleaving). BUS-MASTER interleaves PROC and DMA requests

After mapping, page tables can be anywhere. PTBR set by OS, fast lookup of PTEs All User's have OS in same virtual area. All virtual OS space is mapped identically for all users. OS can turn off virtual addressing to access physical memory. PTBR holds physical address of PT for fast access.

Set dirty bit on write. Set accessed bit on read or write. Clear all accessed bits every k ticks. Page Miss: --- evict page (ordered by preference): ---- 1. dirty == 0, accessed == 0 ---- 2. dirty == 0, accessed == 1 ---- 3. dirty == 1, accessed == 0 ---- 4. dirty == 1, accessed == 1 (NOTE: operations in hardware, not instruction execution.) get addr of PT get PTE get data Speed it up: 1. PTBR <== Page-table-location-pointer Do this once at program startup 2. Cache PTEs!

Read PDE, find PT disk address; Restart; (after restart: becomes Case 1/1) Read PT page from disk;

Page fault for PT as in case 0/1; Restart; (after restart, becomes Case 1/0)

E.G. Simple DM cache ---Use part of virtual address as tag (Page No. + or - some bits) ---Use other bits for index into cache (remainder is block offset) --- Include PID, Accessed and Dirty bits, etc., in cache --- Only translate on misses --- L2 is a physical cache

Shared page PT1: Mapped from V-Page x1234 PT2: Mapped from V-Page xffff Both Map to frame x5678 (Cache data blocks are pages) Process 1 writes Process 2 reads Goal Invariant: --- Never allow this to happen :-) --- Shared data only appears once, at most. (Could also be multiple mappings in same page table.)

Use some page offset bits as index to DM cache. Compare Page# in TLB in parallel w/ cache access. Form physical address using frame# + offset. Compare tag portion of CMAR. "Virtually Indexed, Physically Tagged" INCREASE ASSOCIATIVITY 1. Fixed Cache Size ---- fewer index bits ---- more tag bits 2. Increase Cache Size ---- same index bits ---- same tag bits

Originally No limit checking ---- can overrun segment No protection ----- can write segment registers Segment registers implicit ----- instruction fetch: uses CS ----- data access: uses DS ----- stack operation: uses SS Programmer's perspective: ---- Segments address from 0 ---- Offset is address Too Slow: ---- extend Seg Regs ---- cache Descriptor in Seg Reg ---- Check limit and R/W... Also: ---- Special Segs for Calls ---- "conforming" ==> change mode ---- 8k segments @ 4GB Flat Addressing: ---- set all Descriptors: ---- BASE == x000000000 ---- LIMIT == xffffffff ---- 1-to-1 w/ 32-bit MAR

Seg Selects CS, DS, SS can be written (change segments like original). Descriptor table is OS controlled. Also available in IA-32 (x86) ---- Paging mode (2-level and 3-level) ---- "Real" mode (acts like original) ---- Paged Segments (paging + segmentation: Segment Descriptor points to Page Directory)

---- Simulated machine is arbitrary (HW, ISA) --- Virtual Machine (VM) is defined by simulation program. --- VM's resources are in simulator program's data structures. --- Interaction with host is through simulator's actions. --- OS provides --- IO services --- isolation, protection --- Small monitor provides Mapping to partitioned resources --- Monitor is small, simple, reliable --- Each guest runs in its own VM --- VM is only virtual in the mapping Guest instructions run w/o emulation --- Guest has same ISA as HW --- Each VM has its own OS manages its own resources

Some advantages --- Monitor-1 and Monitor-2 present identical virtual machines to guests --- Guest migration is possible: uptime, bulk efficiencies --- Multiple guests share pool of computing resources --- Isolation between guests (?) --- HW architecture can be different between hosts (degree?) --- Run legacy apps on legacy VM. --- Guest OS configuration specific to guest's apps. OR Binary translation (static or runtime): --- Replace problematic instructions OR New hardware modes of execution.

x86 EFLAGS register, a larger version of our LC3's CC codes: --- N, Z, P --- Overflow --- Carry... --- Interrupt Enable... POPF All 32-bits written IE bit not written vmkernel: --- boot loader --- x86 abstraction --- IO stacks (storage, network) --- memory scheduler --- cpu scheduler VMM (vmkernel priviledged process): --- Trapping, translation --- one per VM

XEN Have to rewrite the OS --- use new ISA ===> more runs w/ VMM ===> faster But --- can't run original OS binaries ===> keeping up w/ the Joneses?

Trap references to page tables --- write protect guest PT Adjust physical frame number --- use shadow page table OR Add HW second layer translation (1960's IBM 370 solution) How much overhead in a page miss? G-OS: page fault, context switches, 100s of cycles VMM: examine G-PT (find G-PA), 100s of cycles VMM: find H-Phys-Addr, 100s of cycles VMM: allocate/fill shadow PT 100s of cycles

--- kicks out needed data from cache --- stalls processor --- Cache data may be stale Fix it: ---- Make buffer pages non-cacheable Too restrictive? ---- Invalidate cache blocks belonging to buffer just before IO? Lock pages to memory temporarily.

Read X X <== X + 5 Write X Read X X <== X + 3 Write X CPU-1 cache <== 2 cache <== 7 X <== 7 CPU-2 cache <== 7 cache <== 10 X <== 10