Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Similar documents
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Memory Hierarchy Y. K. Malaiya

Chapter 6. Storage and Other I/O Topics

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Part II Virtual Memory

Storage. Hwansoo Han

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Chapter 5 B. Large and Fast: Exploiting Memory Hierarchy

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

ECE331: Hardware Organization and Design

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture Computer Science & Engineering. Chapter 6. Storage and Other I/O Topics BK TP.HCM

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 6. Storage and Other I/O Topics. ICE3003: Computer Architecture Fall 2012 Euiseong Seo

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Chapter 6. Storage and Other I/O Topics

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

V. Primary & Secondary Memory!

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

HY225 Lecture 12: DRAM and Virtual Memory

Virtual Memory. Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Chapter 6. Storage and Other I/O Topics. ICE3003: Computer Architecture Spring 2014 Euiseong Seo

Chapter 6 Storage and Other I/O Topics

Chapter 6. Storage and Other I/O Topics

Storage. CS 3410 Computer System Organization & Programming

Chapter 6. Storage & Other I/O

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

EN1640: Design of Computing Systems Topic 06: Memory System

COMPUTER ORGANIZATION AND DESIGN ARM

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

Virtual Memory - Objectives

LECTURE 12. Virtual Memory

Thomas Polzer Institut für Technische Informatik

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Jiang Jiang

1. Creates the illusion of an address space much larger than the physical memory

Virtual Memory: From Address Translation to Demand Paging

Memory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

ECE232: Hardware Organization and Design

I/O CANNOT BE IGNORED

EN1640: Design of Computing Systems Topic 06: Memory System

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

I/O CANNOT BE IGNORED

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

Chapter 6. Storage and Other I/O Topics. Jiang Jiang

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Chapter Seven Morgan Kaufmann Publishers

CSE 120. Operating Systems. March 27, 2014 Lecture 17. Mass Storage. Instructor: Neil Rhodes. Wednesday, March 26, 14

COMP283-Lecture 3 Applied Database Management

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Cache Performance (H&P 5.3; 5.5; 5.6)

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Database Systems II. Secondary Storage

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 2)

EE 4683/5683: COMPUTER ARCHITECTURE

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

COMPUTER ORGANIZATION AND DESIGN

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Virtual Memory. Motivation:

Storage Systems. Storage Systems

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

CIT 668: System Architecture. Computer Systems Architecture

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Course Administration

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

Virtual Memory. Virtual Memory

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

Chapter 11. I/O Management and Disk Scheduling

John Wawrzynek & Nick Weaver

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture Input/Output (I/O) Copyright 2012 Daniel J. Sorin Duke University

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Lecture: Storage, GPUs. Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Transcription:

Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1

The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed by the hardware Managed by the operating system Managed by the operating system Faster Cheaper (3) Virtual to Physical Mapping Physical Address Space Virtual Address Space Exploit program locality at page granularity physical virtual physical pages Program can be larger than memory At any point in time, the program is in memory+disk (4) 2

Virtual Memory Use main memory as a cache for secondary (disk) storage v Managed jointly by CPU hardware and the operating system (OS) Programs share main memory v Each gets a private virtual address space holding its frequently used code and data v Protected from other programs CPU and OS translate virtual addresses to physical addresses v VM block is called a page v VM translation miss is called a page fault (5) Fixed-size pages (e.g., 4K) Address Translation Examples of translation (6) 3

Address Translation: Concepts pages Physical memory pages PPN offset virtual memory pages (located on disk) base address VPN offset Move page and translate address offset PPN Address Translation Data Structure VPN Offsets within the virtual page and corresponding physical page are the same We only need to translate the virtual page number (VPN) to the corresponding physical page number (PPN) also called page frame! effectively a base address (7) Page Tables Stores placement information v Array of page table entries, indexed by virtual page number v Page table register in CPU points to page table in physical memory If page is present in memory v Page table entry (PTE) stores the physical page number v Plus other status bits (referenced, dirty, ) If page is not present v PTE can refer to location in swap space on disk (8) 4

Translation Using a Page Table (9) Page Fault Penalty On page fault, the page must be fetched from disk v Takes millions of clock cycles v Handled by OS code Try to minimize page fault rate v Fully associative placement v Smart replacement algorithms (10) 5

Mapping Pages to Storage (11) Replacement and Writes To reduce page fault rate, prefer least-recently used (LRU) replacement v Reference bit (aka use bit) in PTE set to 1 on access to page v Periodically cleared to 0 by OS v A page with reference bit = 0 has not been used recently Disk writes take millions of cycles v Write through is impractical v Use write-back v Dirty bit in PTE set when page is written (12) 6

Caching PTEs: The Translation Lookaside Buffer (TLB) A four entry TLB VPN PPN V state VPN PPN V state VPN PPN V state VPN PPN V state Keep a cache of most recently used PTEs v Each PTE corresponds to a relatively large part of memory o For example, a 16Kbyte page may have 4K instructions v A small set of PTEs can cover a large code segment o For example, 8 PTEs and 16 Kbyte pages corresponds to a program size of 32K instructions The TLB access time is comparable or better than cache access time Typically operates as a fully associative cache, but can be implemented as a set associative cache (13) Fast Translation Using a TLB (14) 7

TLB Operation Memory ALU registers virtual address TLB physical address Cache Memory miss Memory Translate & Update TLB Memory TLB size typically a function of the target domain v High end machines will have fully associative large TLBs PTE entries are replaced on a demand driven basis The TLB is in the critical path (15) If page is in memory v Load the PTE from memory and retry v Could be handled in hardware TLB Misses o Can get complex for more complicated page table structures v Or in software o Raise a special exception, with optimized handler If page is not in memory (page fault) v OS handles fetching the page and updating the page table v Then restart the faulting instruction IF ID EX MEM WB (16) 8

TLB miss indicates one of v Page present, but PTE not in TLB v Page not present TLB Miss Handler Must recognize TLB miss before destination register overwritten v Raise exception Handler copies PTE from memory to TLB v Then restarts instruction v If page not present, page fault will occur (17) Page Fault Handler Use faulting virtual address to find PTE Locate page on disk Choose page to replace v If dirty, write to disk first What about copies in the cache? Read page into memory and update page table Interaction with the operating system: make process runnable again v Restart from faulting instruction (18) 9

TLB and Cache Interaction If cache tag uses physical address v Need to translate before cache lookup Alternative: use virtual address tag v Complications due to aliasing o Different virtual addresses for shared physical address Example problems (19) 2-Level TLB Organization Intel Nehalem Virtual addr 48 bits 48 bits Physical addr 44 bits 48 bits AMD Opteron X4 Page size 4KB, 2/4MB 4KB, 2/4MB L1 TLB (per core) L2 TLB (per core) L1 I-TLB: 128 entries for small pages, 7 per thread (2 ) for large pages L1 D-TLB: 64 entries for small pages, 32 for large pages Both 4-way, LRU replacement Single L2 TLB: 512 entries 4-way, LRU replacement L1 I-TLB: 48 entries L1 D-TLB: 48 entries Both fully associative, LRU replacement L2 I-TLB: 512 entries L2 D-TLB: 512 entries Both 4-way, round-robin LRU TLB misses Handled in hardware Handled in hardware (20) 10

Memory Protection Different tasks can share parts of their virtual address spaces v But need to protect against errant access v Requires OS assistance Hardware support for OS protection v Privileged supervisor mode (aka kernel mode) v Privileged instructions v Page tables and other state information only accessible in supervisor mode v System call exception (e.g., syscall in MIPS) (21) Sharing shared page A s Page table B s Page table Process A Main Memory Process B Shared physical pages through mappings This raises issues with the cache v Synonym problem: we will not address that here (22) 11

Nonvolatile, rotating magnetic storage Disk Storage (23) Disk Drive Terminology Actuator Arm Head Platters Data is recorded on concentric tracks on both sides of a platter v Tracks are organized as fixed size (bytes) sectors Corresponding tracks on all platters form a cylinder Data is addressed by three coordinates: cylinder, platter, and sector (24) 12

Disk Sectors and Access Each sector records v Sector ID v Data (512 bytes, 4096 bytes proposed) v Error correcting code (ECC) o Used to hide defects and recording errors v Synchronization fields and gaps Access to a sector involves v Queuing delay if other accesses are pending v Seek: move the heads v Rotational latency v Data transfer v Controller overhead (25) Disk Performance Actuator Arm Head Platters Actuator moves (seek) the correct read/write head over the correct sector v Under the control of the controller Disk latency = controller overhead + seek time + rotational delay + transfer delay v Seek time and rotational delay are limited by mechanical parts (26) 13

Disk Performance Seek time determined by the current position of the head, i.e., what track is it covering, and the new position of the head v milliseconds Average rotational delay is time for 0.5 revolutions Transfer rate is a function of bit density (27) Disk Access Example Given v 512B sector, 15,000rpm, 4ms average seek time, 100MB/s transfer rate, 0.2ms controller overhead, idle disk Average read time v 4ms seek time + ½ / (15,000/60) = 2ms rotational latency + 512 / 100MB/s = 0.005ms transfer time + 0.2ms controller delay = 6.2ms If actual average seek time is 1ms v Average read time = 3.2ms (28) 14

Disk Performance Issues Manufacturers quote average seek time v Based on all possible seeks v Locality and OS scheduling lead to smaller actual average seek times Smart disk controller allocate physical sectors on disk v Present logical sector interface to host v Standards: SCSI, ATA, SATA Disk drives include caches v Prefetch sectors in anticipation of access v Avoid seek and rotational delay v Maintain caches in host DRAM (29) Arrays of Inexpensive Disks: Throughput CPU read request Block 0 Block 1 Block 2 Block 3 Data is striped across all disks Visible performance overhead of drive mechanics is amortized across multiple accesses Scientific workloads are well suited to such organizations (30) 15

Arrays of Inexpensive Disks: Request Rate Multiple CPU read requests Consider multiple read requests for small blocks of data Several I/O requests can be serviced concurrently (31) Reliability of Disk Arrays The reliability of an array of N disks is lower than the reliability of a single disk v Any single disk failure will cause the array to fail v The array is N times more likely to fail Use redundant disks to recover from failures v Similar to use of error correcting codes Overhead v Bandwidth and cost Redundant information (32) 16

RAID Redundant Array of Inexpensive (Independent) Disks v Use multiple smaller disks (c.f. one large disk) v Parallelism improves performance v Plus extra disk(s) for redundant data storage Provides fault tolerant storage system v Especially if failed disks can be hot swapped (33) RAID Level 0 0 1 2 3 4 5 6 7 RAID 0 corresponds to use of striping with no redundancy Provides the highest performance Provides the lowest reliability Frequently used in scientific and supercomputing applications where data throughput is important (34) 17

RAID Level 1 mirrors The disk array is mirrored or shadowed in its entirety Reads can be optimized v Pick the array with smaller queuing and seek times Performance sacrifice on writes to both arrays (35) RAID 3: Bit-Interleaved Parity 1 0 0 1 0 Bit level parity N + 1 disks v Data striped across N disks at byte level v Redundant disk stores parity v Read access o Read all disks v Write access o Generate new parity and update all disks v On failure o Use parity to reconstruct missing data Not widely used Parity Disk (36) 18

RAID Level 4: N+1 Disks Block level parity Block 0 Block 1 Block 2 Block 3 Parity Block 4 Block 5 Block 6 Block 7 Parity Parity Disk Data is interleaved in blocks, referred to as the striping unit and striping width Small reads can access subset of the disks A write to a single disk requires 4 accesses v read old block, write new block, read and write parity disk Parity disk can become a bottleneck (37) The Small Write Problem B0 B1 B2 B3 P 4 1 B1-New Ex-OR 2 Ex-OR 3 Two disk read operations followed by two disk write operations (38) 19

RAID 5: Distributed Parity N + 1 disks v Like RAID 4, but parity blocks distributed across disks o Avoids parity disk being a bottleneck Widely used (39) RAID Summary RAID can improve performance and availability v High availability requires hot swapping Assumes independent disk failures v Too bad if the building burns down! See Hard Disk Performance, Quality and Reliability v http://www.pcguide.com/ref/hdd/perf/index.htm (40) 20

Nonvolatile semiconductor storage v 100 1000 faster than disk v Smaller, lower power, more robust v But more $/GB (between disk and DRAM) Flash Storage (41) Flash Types NOR flash: bit cell like a NOR gate v Random read/write access v Used for instruction memory in embedded systems NAND flash: bit cell like a NAND gate v Denser (bits/area), but block-at-a-time access v Cheaper per GB v Used for USB keys, media storage, Flash bits wears out after 1000 s of accesses v Not suitable for direct RAM or disk replacement v Wear leveling: remap data to less used blocks (42) 21

Replace mechanical drives with solid state drives Superior access performance Solid State Disks Adding another level to the memory hierarchy v Disk is the new tape! Wikipedia:PCIe DRAM and SSD Wear-leveling management Fusion-IO (43) The Memory Hierarchy The BIG Picture Common principles apply at all levels of the memory hierarchy v Based on notions of caching At each level in the hierarchy v Block placement v Finding a block v Replacement on a miss v Write policy (44) 22

Concluding Remarks Fast memories are small, large memories are slow v We really want fast, large memories L v Caching gives this illusion J Principle of locality v Programs use a small part of their memory space frequently Memory hierarchy v L1 cache L2 cache DRAM memory disk Memory system design is critical for multiprocessors (45) Study Guide Be able to trace through the page table and cache data structures on a memory reference (see sample problems) Understand how to allocate virtual pages to page frames to minimize conflicts in the cache Relationships between address translation, page size, and cache size. v For example, given a memory system design (page sizes, virtual and physical address spaces, cache parameters) understand the address breakdowns at different levels of the memory hierarchy Be able to map lines in a page to sets in the cache (identify the set from the address) (46) 23

Study Guide Given a cache design and virtual address space and page size, define the pages (by their addresses) that may conflict in the cache Distinguish between a TLB miss, a data cache miss, and a page fault (47) Glossary Page Table Page Table Entry (PTE) Page fault Physical address Physical page Physically tagged cache Synonym problem Translation lookaside buffer (TLB) Virtual address Virtual page Virtually tagged cache (48) 24