Last =me in Lecture 7 3 C s of cache misses Compulsory, Capacity, Conflict

Similar documents
CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 8 Address Transla>on

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Virtual Memory: From Address Translation to Demand Paging

CS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory

CS 61C: Great Ideas in Computer Architecture Excep&ons/Traps/Interrupts. Smart Phone. Core. FuncWonal Unit(s) Logic Gates

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Virtual Memory: From Address Translation to Demand Paging

Lecture 9 - Virtual Memory

CS 61C: Great Ideas in Computer Architecture Virtual Memory. Instructors: Krste Asanovic & Vladimir Stojanovic h>p://inst.eecs.berkeley.

Modern Virtual Memory Systems. Modern Virtual Memory Systems

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 9 Virtual Memory

CS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory. Last?me in Lecture 9

CS252 Spring 2017 Graduate Computer Architecture. Lecture 17: Virtual Memory and Caches

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory. Bernhard Boser & Randy Katz

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory

Lecture 9 Virtual Memory

Part I: You Are Here!

John Wawrzynek & Nick Weaver

CS 61C: Great Ideas in Computer Architecture Virtual Memory. Instructors: John Wawrzynek & Vladimir Stojanovic

New-School Machine Structures. Overarching Theme for Today. Agenda. Review: Memory Management. The Problem 8/1/2011

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

COSC3330 Computer Architecture Lecture 20. Virtual Memory

ecture 33 Virtual Memory Friedland and Weaver Computer Science 61C Spring 2017 April 12th, 2017

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Intro to Virtual Memory

Transla'on, Protec'on, and Virtual Memory. 2/25/16 CS 152 Sec'on 6 Colin Schmidt

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 318 Principles of Operating Systems

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2

ECE 4750 Computer Architecture, Fall 2017 T03 Fundamental Memory Concepts

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Virtual Memory Oct. 29, 2002

Virtual Memory: Concepts

CS 153 Design of Operating Systems Winter 2016

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Virtual to physical address translation

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

Chapter 8 Memory Management

CS162 - Operating Systems and Systems Programming. Address Translation => Paging"

@2010 Badri Computer Architecture Assembly II. Virtual Memory. Topics (Chapter 9) Motivations for VM Address translation

virtual memory Page 1 CSE 361S Disk Disk

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

This lecture. Virtual Memory. Virtual memory (VM) CS Instructor: Sanjeev Se(a

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Virtual Memory. Motivation:

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Chapter 8. Virtual Memory

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 7 Memory III

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

Computer Systems. Virtual Memory. Han, Hwansoo

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Computer Architecture. Lecture 8: Virtual Memory

Opera&ng Systems ECE344

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

CS 61C: Great Ideas in Computer Architecture. Virtual Memory III. Instructor: Dan Garcia

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems Spring 2017

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

Computer Architecture and Engineering. CS152 Quiz #3. March 18th, Professor Krste Asanovic. Name:

CISC 360. Virtual Memory Dec. 4, 2008

Computer Science 146. Computer Architecture

Virtual Memory Nov 9, 2009"

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

Memory Management. An expensive way to run multiple processes: Swapping. CPSC 410/611 : Operating Systems. Memory Management: Paging / Segmentation 1

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

CS 152 Computer Architecture and Engineering

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Computer Architecture ELEC3441

Learning to Play Well With Others

CS399 New Beginnings. Jonathan Walpole

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Processes and Virtual Memory Concepts

Lecture 19: Virtual Memory: Concepts

Transcription:

CS 152 Computer Architecture and Engineering Lecture 8 - Transla=on Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs152! Last =me in Lecture 7 3 C s of cache misses Compulsory, Capacity, Conflict Write policies Write back, write- through, write- allocate, no write allocate MulM- level cache hierarchies reduce miss penalty 3 levels common in modern systems (some have 4!) Can change design tradeoffs of L1 cache if known to have L2 Prefetching: retrieve memory data before CPU request Prefetching can waste bandwidth and cause cache pollumon SoVware vs hardware prefetching SoVware memory hierarchy opmmizamons Loop interchange, loop fusion, cache Mling 2 CS252 S05 1

Bare Machine PC Inst. Cache D Decode E + M Data Cache W Memory Controller Main Memory (DRAM) In a bare machine, the only kind of address is a physical address 3 EDSAC, early 50 s Absolute es Only one program ran at a Mme, with unrestricted access to enmre machine (RAM + I/O devices) es in a program depended upon where the program was to be loaded in memory But it was more convenient for programmers to write locamon- independent subroumnes How could loca;on independence be achieved? Linker and/or loader modify addresses of subrou;nes and callers when building a program memory image 4 CS252 S05 2

Dynamic Transla=on MoMvaMon In early machines, I/O was slow and each I/O transfer involved the CPU (programmed I/O) Higher throughput possible if CPU and I/O of 2 or more programs were overlapped, how? mulmprogramming with DMA I/O devices, interrupts LocaMon- independent programs Programming and storage management ease need for a base register ProtecMon Independent programs should not affect each other inadvertently need for a bound register MulMprogramming drives requirement for resident supervisor sovware to manage context switches between mulmple programs Program 1 Program 2 OS Memory 5 Load X Program Space Simple Base and Bound Transla=on Bound Logical Base Segment Length + Bounds ViolaMon? Current Segment Base and bounds registers are visible/accessible only when processor is running in the supervisor mode Base Memory 6 CS252 S05 3

Separate Areas for Program and Data (Scheme used on all Cray vector supercomputers prior to X1, 2002) Load X Program Space Data Bound Mem. Data Base + Program Bound Program Counter Logical Logical Bounds ViolaMon? Bounds ViolaMon? Data Segment Program Segment Main Memory Program Base + What is an advantage of this separa;on? 7 Base and Bound Machine Program Bound Logical Bounds ViolaMon? Data Bound Logical Bounds ViolaMon? PC + Program Base Inst. Cache D Decode E + M Memory Controller Data Base + Data Cache W Main Memory (DRAM) Can fold addi;on of base register into (register+immediate) address calcula;on using a carry- save adder (sums three numbers with only a few gate delays more than adding two numbers) 8 CS252 S05 4

Memory Fragmenta=on user 1 user 2 user 3 OS Space 16K 24K 24K 32K Users 4 & 5 arrive user 1 user 2 user 4 user 3 OS Space 16K 24K 16K 8K 32K Users 2 & 5 leave user 1 user 4 user 3 OS Space 16K 24K 16K 8K 32K free 24K user 5 24K 24K As users come and go, the storage is fragmented. Therefore, at some stage programs have to be moved around to compact the storage. 9 Paged Memory Systems Processor- generated address can be split into: Page Number Offset A Page Table contains the physical address at the start of each page 0 1 2 3 Space of User- 1 0 1 2 3 Page Table of User- 1 Page tables make it possible to store the pages of a program non- con;guously. 1 0 3 2 Memory 10 CS252 S05 5

Private Space per User User 1 VA1 Page Table OS pages User 2 User 3 VA1 VA1 Page Table Memory Page Table free Each user has a page table Page table contains an entry for each user page 11 Where Should Page Tables Reside? Space required by the page tables (PT) is propormonal to the address space, number of users,... Too large to keep in registers Idea: Keep PTs in the main memory needs one reference to retrieve the page base address and another to access the data word doubles the number of memory references! 12 CS252 S05 6

Page Tables in Memory PT User 1 VA1 User 1 Virtual Space PT User 2 Memory VA1 User 2 Virtual Space 13 CS152 Administrivia 14 CS252 S05 7

A Problem in the Early Six=es There were many applicamons whose data could not fit in the main memory, e.g., payroll Paged memory system reduced fragmenta;on but s;ll required the whole program to be resident in the main memory 15 Manual Overlays Assume an instrucmon can address all the storage on the drum Method 1: programmer keeps track of addresses in the main memory and inimates an I/O transfer when required Difficult, error- prone! Method 2: automamc inimamon of I/O transfers by sovware address translamon Brooker s interpre;ve coding, 1960 Inefficient! 40k bits main 640k bits drum Central Store FerranM Mercury 1956 Not just an ancient black art, e.g., IBM Cell microprocessor using in Playsta;on- 3 has explicitly managed local store! 16 CS252 S05 8

Demand Paging in Atlas (1962) A page from secondary storage is brought into the primary storage whenever it is (implicitly) demanded by the processor. Tom Kilburn Primary memory as a cache for secondary memory User sees 32 x 6 x 512 words of storage Primary 32 Pages 512 words/page Central Memory Secondary (Drum) 32x6 pages 17 Hardware Organiza=on of Atlas EffecMve 48- bit words 512- word pages IniMal Decode 1 Page (PAR) per page frame 0 31 PARs <effec;ve PN, status> Main 32 pages 1.4 µsec 16 ROM pages 0.4-1 µsec 2 subsidiary pages 1.4 µsec Drum (4) 192 pages system code (not swapped) system data (not swapped) 8 Tape decks 88 sec/word Compare the effecmve page address against all 32 PARs match normal access no match page fault save the state of the parmally executed instrucmon 18 CS252 S05 9

Atlas Demand Paging Scheme On a page fault: Input transfer into a free page is inimated The Page (PAR) is updated If no free page is lev, a page is selected to be replaced (based on usage) The replaced page is wrioen on the drum to minimize drum latency effect, the first empty page on the drum was selected The page table is updated to point to the new locamon of the page on the drum 19 Linear Page Table Page Table Entry (PTE) contains: A bit to indicate if a page exists PPN (physical page number) for a memory- resident page DPN (disk page number) for a page on the disk Status bits for protecmon and usage OS sets the Page Table Base whenever acmve user process changes PT Base Supervisor Accessible Control inside CPU Page Table PPN PPN DPN PPN DPN PPN PPN DPN DPN DPN PPN PPN Offset VPN VPN Offset Virtual address from CPU Execute Stage Data Pages Data word 20 CS252 S05 10

Size of Linear Page Table With 32- bit addresses, 4- KB pages & 4- byte PTEs: 220 PTEs, i.e, 4 MB page table per user 4 GB of swap needed to back up full virtual address space Larger pages? Internal fragmentamon (Not all memory in page is used) Larger page fault penalty (more Mme to read from disk) What about 64- bit virtual address space??? Even 1MB pages would require 244 8- byte PTEs (35 TB!) What is the saving grace? 21 Hierarchical Page Table Virtual from CPU 31 22 21 12 11 0 p1 p2 offset 10- bit L1 index Root of the Current Page Table (Processor ) 10- bit L2 index p1 Level 1 Page Table p2 offset Memory page in primary memory page in secondary memory PTE of a nonexistent page Level 2 Page Tables Data Pages 22 CS252 S05 11

Two- Level Page Tables in Memory Virtual Spaces Memory Level 1 PT User 1 VA1 User 1 Level 1 PT User 2 VA1 User2/VA1 User1/VA1 User 2 Level 2 PT User 2 23 Transla=on & Protec=on Virtual Virtual Page No. (VPN) offset Kernel/User Mode Read/Write ProtecMon Check TranslaMon ExcepMon? Page No. (PPN) offset Every instrucmon and data access needs address translamon and protecmon checks A good VM design needs to be fast (~ one cycle) and space efficient 24 CS252 S05 12

Transla=on Lookaside Buffers (TLB) translamon is very expensive! In a two- level page table, each reference becomes several memory accesses SoluMon: Cache transla;ons in TLB TLB hit Single- Cycle Transla;on TLB miss Page- Table Walk to refill virtual address VPN offset V R W D tag PPN (VPN = virtual page number) (PPN = physical page number) hit? physical address PPN offset 25 TLB Designs Typically 32-128 entries, usually fully associamve Each entry maps a large page, hence less spamal locality across pages è more likely that two entries conflict SomeMmes larger TLBs (256-512 entries) are 4-8 way set- associamve Larger systems somemmes have mulm- level (L1 and L2) TLBs Random or FIFO replacement policy No process informamon in TLB? TLB Reach: Size of largest virtual address space that can be simultaneously mapped by TLB Example: 64 TLB entries, 4KB pages, one page per entry 64 entries * 4 KB = 256 KB (if con;guous) TLB Reach =? 26 CS252 S05 13

Handling a TLB Miss SoVware (MIPS, Alpha) TLB miss causes an excepmon and the operamng system walks the page tables and reloads TLB. A privileged untranslated addressing mode used for walk. Hardware (SPARC v8, x86, PowerPC, RISC- V) A memory management unit (MMU) walks the page tables and reloads the TLB. If a missing (data or PT) page is encountered during the TLB reloading, MMU gives up and signals a Page Fault excepmon for the original instrucmon. 27 Hierarchical Page Table Walk: SPARC v8 Virtual Index 1 Index 2 Index 3 Offset 31 23 17 11 0 Context Context Table Table L1 Table Context root ptr PTP L2 Table PTP L3 Table PTE 31 11 0 PPN Offset MMU does this table walk in hardware on a TLB miss 28 CS252 S05 14

PC Page- Based Virtual- Memory Machine Page Fault? Protec;on viola;on? Virtual Inst. TLB Inst. Cache (Hardware Page- Table Walk) D Decode E + M Page Fault? Protec;on viola;on? Virtual Data TLB Data Cache W Miss? Page- Table Base Miss? Hardware Page Table Walker Memory Controller Main Memory (DRAM) Assumes page tables held in untranslated physical memory 29 Transla=on: pu+ng it all together miss Page Table Walk Virtual TLB Lookup hit hardware hardware or sovware sovware ProtecMon Check the page is memory memory denied permioed Where? Page Fault (OS loads page) Update TLB SEGFAULT ProtecMon Fault (to cache) 30 CS252 S05 15

Acknowledgements These slides contain material developed and copyright by: Arvind (MIT) Krste Asanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David Paoerson (UCB) MIT material derived from course 6.823 UCB material derived from course CS252 31 CS252 S05 16