Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Similar documents
CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

CS61C : Machine Structures

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

ECE232: Hardware Organization and Design

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

Virtual Memory. Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Virtual Memory - Objectives

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

COMPUTER ORGANIZATION AND DESIGN

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture ELEC3441

Appendix D. Controller Implementation

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy Y. K. Malaiya

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

1. Creates the illusion of an address space much larger than the physical memory

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Transistor: Digital Building Blocks

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

Memory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

Virtual Memory. Virtual Memory

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Python Programming: An Introduction to Computer Science

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Part II Virtual Memory

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Uniprocessors. HPC Prof. Robert van Engelen

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Main Memory (Fig. 7.13) Main Memory

UNIVERSITY OF MORATUWA

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

Pipelined processors and Hazards

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Outline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers

The Magma Database file formats

Chapter 4 The Datapath

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

ECE331: Hardware Organization and Design

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition

Page 1. Memory Hierarchies (Part 2)

Virtual memory why? Virtual memory parameters Compared to first-level cache Parameter First-level cache Virtual memory. Virtual memory concepts

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

V. Primary & Secondary Memory!

EE 4683/5683: COMPUTER ARCHITECTURE

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

. Written in factored form it is easy to see that the roots are 2, 2, i,

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Operating System Concepts. Operating System Concepts

Data diverse software fault tolerance techniques

Virtual Memory. Motivation:

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

CS3350B Computer Architecture

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Virtual Memory, Address Translation

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Threads and Concurrency in Java: Part 1

Memory Hierarchies 2009 DAT105

ADDRESS TRANSLATION AND TLB

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

Threads and Concurrency in Java: Part 1

Computer Science 146. Computer Architecture

Memory hierarchy review. ECE 154B Dmitri Strukov

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Review: The ACID properties

ADDRESS TRANSLATION AND TLB

Virtual Memory: From Address Translation to Demand Paging

MOTIF XF Extension Owner s Manual

EEC 483 Computer Organization. Chapter 5.3 Measuring and Improving Cache Performance. Chansu Yu

Virtual Memory, Address Translation

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Mathematical Stat I: solutions of homework 1

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1

Transcription:

Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple of locality to preset the user with as much memory as possible at the fastest speed ad cheapest price. Icreasig distace from the processor i access time. Processor L$ L2$ Mai Memory 4-8 bytes (word) 8-32 bytes (block) to 4 blocks Secodary Memory,024+ bytes (disk sector = page) Iclusive what is i L$ is a subset of what is i L2$ is a subset of what is i MM is a subset of what is i SM. (Relative) size of the memory at each level Chapter 5 Large ad Fast: Exploitig Memory Hierarchy

Morga Kaufma Publishers 26 February, 208 How is the Hierarchy Maaged? Registers «cache By compiler or programmer. Cache «mai memory By the cache cotroller hardware. Mai memory «disks By the operatig system (virtual memory). Virtual to physical address mappig assisted by the hardware. Virtual Memory Use mai memory as a cache for secodary memory: Allows efficiet ad safe sharig of memory amog multiple programs. Provides the ability to ru programs larger tha the size of physical memory. Simplifies loadig a program for executio by providig for code relocatio (i.e., the code ca be loaded aywhere i mai memory). Each program is compiled ito its ow address space a virtual address space: Durig ru-time, each virtual address must be traslated to a physical address - a address i mai memory. Virtual Memory block is called a page. Virtual Memory traslatio miss is called a page fault. Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 2

Morga Kaufma Publishers 26 February, 208 Two Programs Sharig Physical Memory A program s address space is divided ito pages - fixed size - or segmets - variable sizes: The startig locatio of each page (either i mai memory or i secodary memory) is cotaied i the program s page table. Address Traslatio A virtual address is traslated to a physical address by a combiatio of hardware ad software. Each memory request first requires a address traslatio from the virtual space to the physical space. Virtual Address (VA) 3 30... 2... 0 Virtual page umber Page offset Traslatio Physical page umber Page offset 29... 2 0 Physical Address (PA) Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 3

Morga Kaufma Publishers 26 February, 208 Page Fault Pealty O page fault, the etire page must be fetched from disk: Takes millios of clock cycles. Hadled by the Operatig System. Try to miimize page fault rate: Fully associative placemet of page i mai memory. Smarter replacemet algorithms. Page Tables Stores placemet iformatio: A Page Table is a array of page table etries, idexed by virtual page umber. Page table register poits to page i physical memory. If page is preset i memory: PTE stores the physical page umber. Plus other status bits (refereced, dirty, ). If page is ot preset: Page Fault OS gets ivolved. Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 4

Morga Kaufma Publishers 26 February, 208 Traslatio Usig a Page Table Replacemet ad Writes To reduce page fault rate, prefer least-recetly used (LRU) replacemet: Referece bit i Page-table-etry set to o access to page. Periodically cleared to 0 by OS. A page with referece bit = 0 has ot bee used recetly. Disk writes take millios of cycles: Write-through is impractical so write-back is used. Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 5

Morga Kaufma Publishers 26 February, 208 Address Traslatio Summary Virtual page # Offset Page table register V 0 0 0 Physical page # Physical page base addr Page Table (i mai memory) Offset Mai memory Disk storage Virtual Addressig with a Cache It takes a extra memory access to traslate a Virtual Address to a Physical Address via the Page Table. CPU VA PA miss Traslatio Cache data hit Mai Memory This makes cache accesses very expesive (if every access was really two accesses). The hardware fix is to use a Traslatio Lookaside Buffer (TLB) a small cache that keeps track of recetly used address mappigs to avoid havig to do a page table lookup. Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 6

Morga Kaufma Publishers 26 February, 208 Fast Traslatio Usig a TLB TLB s work well because access to page tables has good locality: Use a fast cache of Page-Table-Etries withi the CPU. Typical: 6 52 PTEs, 0.5 cycle for hit, 0 00 cycles for miss, 0.0% % miss rate. Misses ca be hadled by hardware or software. Just like ay other cache, the TLB ca be orgaized as fully associative, set associative, or direct mapped. Fast Traslatio Usig a TLB Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 7

Morga Kaufma Publishers 26 February, 208 A TLB i the Memory Hierarchy A TLB miss is it a page fault or merely a TLB miss? If the page is loaded ito mai memory, the the TLB miss ca be hadled (i hardware or software) by loadig the traslatio iformatio from the page table ito the TLB: Takes 0 s of cycles to fid ad load the traslatio ifo ito the TLB. If the page is ot i mai memory, the it s a true page fault: Takes,000,000 s of cycles to service a page fault. TLB misses are much more frequet tha true page faults. TLB Evet Combiatios TLB Page Table Cache Hit Hit Hit Hit Hit Miss Miss Hit Hit Miss Hit Miss Miss Miss Miss Hit Miss Miss/ Hit Miss Miss Hit Possible? Uder what circumstaces? Yes this is what we wat! Yes although the page table is ot checked if the TLB hits (Page fault). Yes TLB miss, PA i page table. Yes TLB miss, PA i page table, but data ot i cache (Page fault). Yes page fault (OS allocates ew PT etry). Impossible TLB caot Hit if Page Table misses. Impossible data ot allowed i cache if No Page Table etry. Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 8

Morga Kaufma Publishers 26 February, 208 Memory Protectio Differet tasks ca share parts of their virtual address spaces: But eed to protect agaist errat access. Requires OS assistace. Hardware support for OS protectio: Privileged supervisor mode (aka kerel mode). Privileged istructios. Page tables ad other state iformatio oly accessible i supervisor mode. Some Virtual Memory Desig Parameters Total size Total size (KB) VM Page 6,000 to 250,000 words 250,000 to,000,000,000 TLBs 6 to 52 etries 0.25 to 6 Block size (B) 4000 to 64,000 4 to 8 Hit time 0.5 to clock cycle Miss pealty (clocks) Miss rates 0,000,000 to 00,000,000 0.0000% to 0.000% 0 to 00 0.0% to % Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 9

Morga Kaufma Publishers 26 February, 208 2-Level TLB Orgaizatio Two Machies TLB Parameters Itel Nehalem AMD Barceloa Address sizes 48 bits (vir); 44 bits (phy) 48 bits (vir); 48 bits (phy) Page size 4KB 4KB TLB orgaizatio L TLB for istructios ad L TLB for data per core; both are 4-way set assoc.; LRU L ITLB has 28 etries, L2 DTLB has 64 etries L2 TLB (uified) is 4-way set assoc.; LRU L2 TLB has 52 etries TLB misses hadled i hardware L TLB for istructios ad L TLB for data per core; both are fully assoc.; LRU L ITLB ad DTLB each have 48 etries L2 TLB for istructios ad L2 TLB for data per core; each are 4-way set assoc.; roud robi LRU Both L2 TLBs have 52 etries TLB misses hadled i hardware Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 0

Morga Kaufma Publishers 26 February, 208 Two Machies TLB Parameters TLB orgaizatio Itel P4 TLB for istructios ad TLB for data Both 4-way set associative Both use ~LRU replacemet Both have 28 etries TLB misses hadled i hardware AMD Optero 2 TLBs for istructios ad 2 TLBs for data Both L TLBs fully associative with ~LRU replacemet Both L2 TLBs are 4-way set associative with roud-robi LRU Both L TLBs have 40 etries Both L2 TLBs have 52 etries TLB misses hadled i hardware The Hardware/Software Boudary What parts of the virtual to physical address traslatio are doe by or assisted by the hardware? Traslatio Lookaside Buffer (TLB) that caches the recet traslatios: TLB access time is part of the cache hit time. May allot a extra stage i the pipelie for TLB access. Page table storage, fault detectio, ad updatig: Page faults result i precise iterrupts that are the hadled by the OS. Hardware must support Dirty ad Referece bits i the Page Tables. Chapter 5 Large ad Fast: Exploitig Memory Hierarchy

Morga Kaufma Publishers 26 February, 208 Summary: Questios for the Memory Hierarchy Q: Where ca a etry be placed i the cache? (Etry placemet) Q2: How is a etry foud if it is i the cache? (Etry idetificatio) Q3: Which etry should be replaced o a miss? (Etry replacemet) Q4: What happes o a write? (Write strategy) Q&Q2: Where ca a etry be placed/foud? # of sets Etries per set Direct mapped # of etries Set associative (# of etries)/ associativity Associativity (typically 2 to 6) Fully associative # of etries Locatio method Direct mapped Idex Set associative Idex the set; compare set s tags # of comparisos Degree of associativity Fully associative Compare all etries tags # of etries Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 2

Morga Kaufma Publishers 26 February, 208 Q3: Which etry should be replaced o a miss? Easy for direct mapped oly oe choice. Set associative or fully associative: Radom. LRU (Least Recetly Used). For a 2-way set associative cache, radom replacemet has a miss rate about. times higher tha LRU. LRU is too costly to implemet for high levels of associativity (> 4-way) sice trackig the usage iformatio is costly. Q4: What happes o a write? Write-through The iformatio is writte to the etry i the curret memory level ad to the etry i the ext level of the memory hierarchy: Always combied with a write buffer so write-waits to ext level memory ca be elimiated (if the write buffer does t fill). Write-back The iformatio is writte oly to the etry i the curret memory level. The modified etry is writte to ext level of memory oly whe it is replaced. Need a dirty bit to keep track of whether the etry is clea or dirty. Virtual memory systems always use write-back. Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 3

Morga Kaufma Publishers 26 February, 208 Multilevel O-Chip Caches Itel Nehalem 4-core processor Per core: 32KB L I-cache, 32KB L D-cache, 52KB L2 cache 3-Level Cache Orgaizatio Itel Nehalem AMD Optero X4 L caches (per core) L I-cache: 32KB, 64-byte blocks, 4-way, approx LRU replacemet, hit time /a L D-cache: 32KB, 64-byte blocks, 8-way, approx LRU replacemet, writeback/allocate, hit time /a L I-cache: 32KB, 64-byte blocks, 2-way, LRU replacemet, hit time 3 cycles L D-cache: 32KB, 64-byte blocks, 2-way, LRU replacemet, writeback/allocate, hit time 9 cycles L2 uified cache (per core) 256KB, 64-byte blocks, 8-way, 52KB, 64-byte blocks, 6-way, approx LRU replacemet, write- approx LRU replacemet, writeback/allocate, hit time /a back/allocate, hit time /a L3 uified cache (shared) 8MB, 64-byte blocks, 6-way, replacemet /a, writeback/allocate, hit time /a 2MB, 64-byte blocks, 32-way, replace block shared by fewest cores, write-back/allocate, hit time 32 cycles /a: data ot available Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 4

Morga Kaufma Publishers 26 February, 208 Summary The Priciple of Locality: Program likely to access a relatively small portio of the address space at ay istat of time: Temporal Locality - Locality i Time. Spatial Locality - Locality i Space. Caches, TLBs, Virtual Memory all uderstood by examiig how they deal with the four questios:. Where ca etry be placed? 2. How is etry foud? 3. What etry is replaced o miss? 4. How are writes hadled? Page Tables map virtual address to physical address: TLBs are importat for fast traslatio. Chapter 5 Large ad Fast: Exploitig Memory Hierarchy 5