HY225 Lecture 12: DRAM and Virtual Memory

Size: px

Start display at page:

Download "HY225 Lecture 12: DRAM and Virtual Memory"

Ralph Scott
5 years ago
Views:

1 HY225 Lecture 12: DRAM and irtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 1 / 36 DRAM Fundamentals Random-access memory using one transistor-capacitor pair per bit Capacitors leak, needs refresh Composed of one or more memory arrays Organized in rows and columns Need sense amplifiers to compensate for voltage swing Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 2 / 36

2 DRAM cell Column decoder Data in/out buffers Sense amplifiers columns Row decoder rows Memory array bit line Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 3 / 36 DRAM Fundamentals Each DRAM memory array outputs one bit DRAMS use multiple memory arrays to output multiple bits at a time N indicates DRAM with N memory arrays 16, 32 DRAMS typical today Each collection of N arrays forms a DRAM bank Banks can be read/written independently Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 4 / 36

3 4 DRAM Column Column decoder Column decoder Column decoder decoder Data Data in/out in/out buffers Data in/out buffers Data in/out buffers buffers Sense Sense amplifiers Sense amplifiers Sense amplifiers amplifiers Row decoder Row Row decoder decoder Row decoder Memory Memory array Memory array Memory array array bit bit line bit line bit line line Interleaved DRAM Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 5 / 36 DRAM memory bandwidth Limited bandwidth from one DRAM bank Increase bandwidth by delivering data from multiple banks Processor DRAM interconnect (e.g. bus) has higher clock frequency than any one DRAM Bus control switches between multiple DRAM banks to achieve high data rate Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 6 / 36

4 DIMMs and Ranks One DRAM, eight internal banks, shared I/O link ΜUX Ι/Ο one bank, x4 array One DIMM, with one DRAM rank Memory Memory array array Memory array Memory array Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 7 / 36 Modern DRAM organization Hierarchy of DRAM memories A system has multiple DIMMs Each DIMM has multiple DRAM devices in one or more ranks Each DRAM device has multiple banks Each bank has multiple memory arrays Concurrency in ranks and banks increases memory bandwidth Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 8 / 36

DRAM Performance DRAM typically connected to processor via fixed-width clocked bus Bus clocks tend to be slower than CPU clocks Cache block read with no optimization: Assume 1 bus cycle

cycles Bandwidth = 16 bytes / 65 bus cycles = 0.25 bytes / bus cycle Dimitrios S.

5 DRAM Performance DRAM typically connected to processor via fixed-width clocked bus Bus clocks tend to be slower than CPU clocks Cache block read with no optimization: Assume 1 bus cycle for address transfer 15 bus cycles per DRAM access to fetch a word 1 bus arbitration cycle per word data transfer 4-word block, 1 word-wide DRAM Miss penalty = = 65 bus cycles Bandwidth = 16 bytes / 65 bus cycles = 0.25 bytes / bus cycle Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 9 / 36 Improving DRAM bandwidth Wide (4-word in example) DRAM Miss penalty = = 17 bus cycles 4-bank interleaved memory Miss penalty = = 20 bus cycles Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 10 / 36

irtual Memory Use main memory as a cache for secondary (disk, flash) storage Managed jointly by processor and operating system Programs share

Processor and OS try to keep the most frequently used parts of code and data in DRAM Primary motivation for M is protection.

the OS Processor and OS translate virtual addresses to physical addresses M block is called a page M translation miss is called a page fault

6 irtual Memory Use main memory as a cache for secondary (disk, flash) storage Managed jointly by processor and operating system Programs share main memory With M, each program gets a private virtual address space, with the entire range of addresses available on the processor. Processor and OS try to keep the most frequently used parts of code and data in DRAM Primary motivation for M is protection. M forbids programs from accessing the physical code and data addresses of other programs unless sharing is requested by programs and approved by the OS Processor and OS translate virtual addresses to physical addresses M block is called a page M translation miss is called a page fault Address Translation Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 11 / 36 Fixed-size pages (e.g. 4K) Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 12 / 36

Page Tables Table storing page placement information in physical memory Array of page table entries, indexed by virtual page number Page table is itself stored in physical memory Processor register

7 Page Tables Table storing page placement information in physical memory Array of page table entries, indexed by virtual page number Page table is itself stored in physical memory Processor register points to beginning of page table in physical memory If page is present in memory Page table entry (PTE) stores physical page number Plus other status bits (valid, referenced, dirty) If page is not present PTE points to location on disk (or swap space, or other non-volatile storage medium) where page is stored Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 13 / 36 Translation using a Page Table Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 14 / 36

policy Reference bit (aka use bit) per PTE set to 1 on access to the page Periodically cleared to 0 by the OS A page with reference bit = 0 has not been used

8 Mapping Pages to Storage Replacement and Writes Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 15 / 36 To reduce page fault rate, M systems prefer approximations of the least-recently used (LRU) replacement policy Reference bit (aka use bit) per PTE set to 1 on access to the page Periodically cleared to 0 by the OS A page with reference bit = 0 has not been used recently Disk writes may take millions of clock cycles M needs to write entire block at once, not individual words or bytes Write through is impractical Use write-back Dirty bit in PTE set when page is written Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 16 / 36

9 Problems of Single-Level Page Tables Page table maps all pages in the address space regardless of whether they are used to not In practice many programs use only a small fraction of the address space Problem can be solved by keeping multiple page tables, each mapping a region (e.g. 4 MB) of virtual memory Most of the time programs occupy the top part of the address space for code and data and the bottom part for stack Smaller page tables linked together with a parent master table Organization called multi-level page table Two-Level Page Table Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 17 / 36 virtual page number page directory 10 second level page table 1024 entries x 4K = 4M coverage 12 second level base addr. physical page number page offset Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 18 / 36

10 Three-Level Page Table virtual page number page directory entries (2 L base) 16 second level page table entries (3 L base) second level base addr. third level base addr. physical page number page offset 64 KB pages Inverted Page Table Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 19 / 36 Physical memory size typically much smaller than virtual memory size Use one entry per physical page frame instead of virtual page Problem: given a virtual page number how does the system find the mapping? Solutions: linear search, hash search Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 20 / 36

11 Linear Inverted Page Table Offset Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 21 / 36 Hashed Inverted Page Table Offset Hash Table Hash Function PPN Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 22 / 36

Fast Translation using a TLB Address translation with a page table in memory requires additional memory references One to access the PTE One for the

Buffer (TLB) Typical size: 16 512 entries, 0.5 1 cycle hit 10-100 cycles for miss, 0.

12 Fast Translation using a TLB Address translation with a page table in memory requires additional memory references One to access the PTE One for the actual memory access Access to page table often has good locality Use a fast cache of PTEs within the processor Fast cache called a Translation Look-aside Buffer (TLB) Typical size: entries, cycle hit cycles for miss, 0.01% 1% miss rate Misses can be handled by hardware or software Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 23 / 36 Fast Translation using a TLB Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 24 / 36

13 TLB Misses If page is in memory Load PTE from memory and retry translation Operation can be handled in hardware Hardware complexity increases with multi-level page tables, inverted page tables Operation can be handled in software Translation miss raises exception, operating system handler restores translation If page is not in memory, page fault Operating system handles fetching of the page from secondary storage (e.g. disk) and updating the page table Execution of program restarts at faulting instruction TLB Miss Handler Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 25 / 36 TLB miss indicates Page present, but PTE not in TLB Page not present Must recognize TLB miss before destination register is overwritten Raise exception Handler copies PTE from memory to TLB Then restarts instruction If page not present, page fault will occur Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 26 / 36

Page Fault Handler Use faulting virtual address to find PTE Locate page on secondary storage (e.g. disk) Choose page to replace If dirty, write to disk first

Nikolopoulos Lecture 12: DRAM and irtual Memory 27 / 36 If cache tag uses physical address Need to translate before cache lookup Alternative: use virtual

14 Page Fault Handler Use faulting virtual address to find PTE Locate page on secondary storage (e.g. disk) Choose page to replace If dirty, write to disk first Read page into memory and update page table Make process runnable again Restart from faulting instruction TLB and Cache Interaction Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 27 / 36 If cache tag uses physical address Need to translate before cache lookup Alternative: use virtual address Complications due to aliasing Different virtual addresses may be used for the shared physical address Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 28 / 36

15 Kernel and User Mode Hardware needs to have a way to locate the page table of each program Hardware uses special register, not visible to programs for this purpose The register for page table indexing must only be accessible to the OS, not user programs How are certain hardware structures protected from user access? System operates in two modes: user mode where user programs run and kernel mode where the operating system runs Protected hardware structures modified only in kernel mode Mode switching Processor switches from user to kernel mode at: System calls, exceptions, interrupts Operating system calls exception handler for each case Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 29 / 36 Protecting Access to Memory Operating system can control access rights to memory for each page Each page can be designated as read-only, writeable, or executable (rwx) Code pages are typically read-only and executable Data pages are read-only or read-write Each user program may request to change permission accesses to its own pages Any request to change permission must be made to the OS, OS grants or rejects request Control bits stored in PTE Permission bits (rwx) alid bit (page present in memory) Dirty bit (page written since it was brought in memory, used to write-back on replacement) Reference bit(s) (used to implement replacement policy, as an approximation to LRU) Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 30 / 36

16 TLB and Multiprogramming On a multiprogramming system the operating system performs process switching Each user program executes in a process with a private address space Operating system may provide the illusion that programs execute concurrently on the same processor by time-multiplexing the processor between processes What happens to contents of TLB when the operating system switches processes? First solution is TLB flushing: the operating system clears the contents of the TLB so that the new process starts with an empty TLB Second solution is TLB sharing between processes: TLB hardware has a process identifier attached to each TLB entry. Process identifier enables differentiation between page translations of different processes. Sharing and Protection Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 31 / 36 In several cases it is beneficial for processes to share physical memory Example: two instances of a web browser belonging to different users (different address spaces). Both instances execute the same code, therefore replicating the code in physical memory is not economical Operating system implements sharing by mapping (potentially different) virtual pages of different processes to the same physical page Second example: two processes of the same user need to share data (e.g. a software pipeline) User may request from the operating system memory and designated it as shared with other processes Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 32 / 36

17 irtual Machines Host computer emulates multiple guest operating systems and machine resources Host can run multiple guest operating systems simultaneously and achieve isolation between guests Avoids security and reliability problems Aids sharing of hardware resources between users with a custom software environment each irtualization has a performance impact Privileged operations of guest operating systems are emulated by a virtual machine monitor as opposed to be executed natively Examples MWare Xen irtual Machine Monitor Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 33 / 36 MM maps virtual hardware resources accessed by guest virtual machines to physical hardware resources Memory, I/O devices, CPUs Guest code running at user level runs natively on the processor Privileged guest instructions trap to MM to access protected resources Guest OS may be different from host OS MM handles I/O devices Emulates I/O devices as generic virtual files accessible from guests Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 34 / 36

18 Example: Timer irtualization In native machine, on timer interrupt OS suspends current process, handles interrupt, selects and resumes next process (context switching) With MM MM suspends current M, handles interrupt, selects and resumes next M If a M requires timer interrupts MM emulates a virtual timer Emulates interrupt for M when physical timer interrupt occurs Instruction Set Support Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 35 / 36 User and System Modes Privileged instructions only available in system mode Trap to system if executed in user mode All physical resources only accessible through privileged instructions Includes access to page table registers, interrupt control, I/O registers irtualization support begins to appear in common RISC and CISC ISAs Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 36 / 36

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Precise Definition of Virtual Memory Virtual memory is a mechanism for translating logical