Similar documents
Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary

Chapter 3. SEQUENTIAL Logic Design. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris.

Chapter 8. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 8 <1>

Direct Mapped Cache Hardware. Direct Mapped Cache. Direct Mapped Cache Performance. Direct Mapped Cache Performance. Miss Rate = 3/15 = 20%

EN1640: Design of Computing Systems Topic 06: Memory System

Chapter 8. Chapter 8:: Topics. Introduction. Processor Memory Gap

Chapter 8 Memory Hierarchy and Cache Memory

EN1640: Design of Computing Systems Topic 06: Memory System

CSEE 3827: Fundamentals of Computer Systems, Spring Caches

Chapter 8 Virtual Memory

Fundamentals of Computer Systems

Fundamentals of Computer Systems

Memories. Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu.

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Memory Hierarchy Y. K. Malaiya

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

LECTURE 11. Memory Hierarchy

Memory Hierarchy. Mehran Rezaei

EE 4683/5683: COMPUTER ARCHITECTURE

CS3350B Computer Architecture

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Memory Hierarchies &

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CMPT 300 Introduction to Operating Systems

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

CS61C : Machine Structures

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Memory hierarchy Outline

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Course Administration

Handout 4 Memory Hierarchy

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

COMPUTER ORGANIZATION AND DESIGN

LECTURE 12. Virtual Memory

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

CS3350B Computer Architecture

Topic 18: Virtual Memory

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

V. Primary & Secondary Memory!

Background. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.

Page 1. Memory Hierarchies (Part 2)

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Memory Management! Goals of this Lecture!

Cycle Time for Non-pipelined & Pipelined processors

The Memory Hierarchy & Cache

14:332:331. Week 13 Basics of Cache

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Review : Pipelining. Memory Hierarchy

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

www-inst.eecs.berkeley.edu/~cs61c/

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

ECE232: Hardware Organization and Design

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

Introduction to cache memories

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Virtual Memory: From Address Translation to Demand Paging

The University of Adelaide, School of Computer Science 13 September 2018

ECE232: Hardware Organization and Design

Memory Hierarchy: Caches, Virtual Memory

ECE 4750 Computer Architecture, Fall 2017 T03 Fundamental Memory Concepts

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

UC Berkeley CS61C : Machine Structures

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

1. Creates the illusion of an address space much larger than the physical memory

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

UCB CS61C : Machine Structures

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Question?! Processor comparison!

Transcription:

Cache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris & Sarah L. Harris 27 Elsevier 1

What Will We Learn? Brief review of how data can be stored Memory System Performance Analysis Caches 2

Review: What were Memory Elements? Memories are large blocks A significant portion of a modern circuit is memory. Memories are practical tools for system design Programmability, reconfigurability all require memory Allows you to store data and work on stored data Not all algorithms are designed to process data as it comes, some require data to be stored. Data type determines required storage SMS: 16 bytes 1 second normal audio: 64 kbytes 1 HD picture: 7.32 Mbytes 3

How Can We Store Data Flip-Flops (or Latches) Very fast, parallel access Expensive (one bit costs 2+ transistors) Static RAM Relatively fast, only one data word at a time Less expensive (one bit costs 6 transistors) Dynamic RAM Slower, reading destroys content (refresh), one data word at a time, needs special process Cheaper (one bit is only a transistor) Other storage technology (hard disk, flash) Much slower, access takes a long time, non-volatile Per bit cost is lower (no transistors directly involved) 4

Array Organization of Memories Address 1 124-word x 32-bit Array Efficiently store large amounts of data Consists of a memory array to store data The address selects one row of the array The data in the row is read out 32 Data An M-bit value can be read or written at each unique N-bit address All values in array can be accessed but only M-bits at a time 5

Introduction Computer performance depends on: Processor performance Memory system performance CLK CLK Processor MemWrite Address WriteData WE Memory ReadData 7

Introduction Up until now, assumed memory could be accessed in 1 clock cycle But that hasn t been true since the 198 s 8

Memory System Challenge Make memory system appear as fast as processor Use a hierarchy of memories Ideal memory: Fast Cheap (inexpensive) Large (capacity) But we can only choose two! 9

Speed Carnegie Mellon Memory Hierarchy Technology cost / GB Access time Cache SRAM ~ $1, ~ 1 ns Main Memory Virtual Memory DRAM ~ $1 ~ 1 ns Hard Disk ~ $1 ~ 1,, ns Size 1

Locality Exploit locality to make memory accesses fast Temporal Locality: Locality in time (e.g., if looked at a Web page recently, likely to look at it again soon) If data used recently, likely to use it again soon How to exploit: keep recently accessed data in higher levels of memory hierarchy Spatial Locality: Locality in space (e.g., if read one page of book recently, likely to read nearby pages soon) If data used recently, likely to use nearby data soon How to exploit: when accessing data, bring nearby data into higher levels of memory hierarchy too 11

Memory Performance Hit: is found in that level of memory hierarchy Miss: is not found (must go to next level) Hit Rate = # hits / # memory accesses = 1 Miss Rate Miss Rate = # misses / # memory accesses = 1 Hit Rate Average memory access time (AMAT): average time it takes for processor to access data AMAT = t cache + MR cache [t MM + MR MM (t VM )] 12

Memory Performance Example 1 A program has 2, load and store instructions. 1,25 of these data values found in cache. The rest are supplied by other levels of memory hierarchy What are the hit and miss rates for the cache? Hit Rate = Miss Rate = 13

Memory Performance Example 1 A program has 2, load and store instructions. 1,25 of these data values found in cache. The rest are supplied by other levels of memory hierarchy What are the hit and miss rates for the cache? Hit Rate = 125/2 =.625 Miss Rate = 75/2 =.375 = 1 Hit Rate 14

Memory Performance Example 2 Suppose a processor has 2 levels of hierarchy: cache and main memory: t cache = 1 cycle, t MM = 1 cycles What is the AMAT of the program from Example 1? AMAT = 15

Memory Performance Example 2 Suppose a processor has 2 levels of hierarchy: cache and main memory: t cache = 1 cycle, t MM = 1 cycles What is the AMAT of the program from Example 1? AMAT = t cache + MR cache (t MM ) = [1 +.375(1)] cycles = 38.5 cycles 16

Cache A safe place to hide things Highest level in memory hierarchy Fast (typically ~ 1 cycle access time) Ideally supplies most of the data to the processor Usually holds most recently accessed data 17

Cache Design Questions What data is held in the cache? How is data found? What data is replaced? We ll focus on data loads, but stores follow same principles 18

What data is held in the cache? Ideally, cache anticipates data needed by processor and holds it in cache But impossible to predict future So, use past to predict future temporal and spatial locality: Temporal locality: copy newly accessed data into cache. Next time it s accessed, it s available in cache. Spatial locality: copy neighboring data into cache too. Block size = number of bytes copied into cache at once. 19

Cache Terminology Capacity (C): the number of data bytes a cache stores Block size (b): bytes of data brought into cache at once Number of blocks (B = C/b): number of blocks in cache: B = C/b Degree of associativity (N): number of blocks in a set Number of sets (S = B/N): each memory address maps to exactly one cache set 2

How is data found? Cache organized into S sets Each memory address maps to exactly one set Caches categorized by number of blocks in a set: Direct mapped: 1 block per set N-way set associative: N blocks per set Fully associative: all cache blocks are in a single set Examine each organization for a cache with: Capacity (C = 8 words) Block size (b = 1 word) So, number of blocks (B = 8) 21

Direct Mapped Cache Address 11...111111 mem[xff...fc] 11...11111 mem[xff...f8] 11...11111 mem[xff...f4] 11...1111 mem[xff...f] 11...11111 mem[xff...ec] 11...1111 mem[xff...e8] 11...1111 mem[xff...e4] 11...111 mem[xff...e]...11...1...111...11...11...1...11...1...1... mem[x...24] mem[x..2] mem[x..1c] mem[x...18] mem[x...14] mem[x...1] mem[x...c] mem[x...8] mem[x...4] mem[x...] 2 3 Word Main Memory 2 3 Word Cache Set Number 7 (111) 6 (11) 5 (11) 4 (1) 3 (11) 2 (1) 1 (1) () 22

Direct Mapped Cache Hardware Memory Address Tag Set 27 3 Byte Offset V Tag Data 8-entry x (1+27+32)-bit SRAM = 27 32 Hit Data 23

Direct Mapped Cache Performance Memory Address Tag... Set 1 3 Byte Offset V Tag Data 1... mem[x...c] 1... mem[x...8] 1... mem[x...4] Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () # MIPS assembly code addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, xc($) lw $t3, x8($) addi $t, $t, -1 j loop done: Miss Rate = 24

Direct Mapped Cache Performance Memory Address Tag... Set 1 3 Byte Offset V Tag Data 1... mem[x...c] 1... mem[x...8] 1... mem[x...4] Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () # MIPS assembly code addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, xc($) lw $t3, x8($) addi $t, $t, -1 j loop done: Miss Rate = 3/15 = 2% Temporal Locality Compulsory Misses 25

Direct Mapped Cache: Conflict Memory Address Tag...1 Set 1 3 Byte Offset V 1 Tag... Data mem[x...4] mem[x...24] Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () # MIPS assembly code addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, x24($) addi $t, $t, -1 j loop done: Miss Rate = 26

Direct Mapped Cache: Conflict Memory Address Tag...1 Set 1 3 Byte Offset V 1 Tag... Data mem[x...4] mem[x...24] Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () # MIPS assembly code addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, x24($) addi $t, $t, -1 j loop done: Miss Rate = 1/1 = 1% Conflict Misses 27

1 Carnegie Mellon N-Way Set Associative Cache Memory Address Tag Set 28 2 Byte Offset V Tag Way 1 Way Data V Tag Data 28 32 28 32 = = Hit 1 Hit Hit 1 32 Hit Data 28

N-way Set Associative Performance # MIPS assembly code addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, x24($) addi $t, $t, -1 j loop done: Miss Rate = Way 1 Way V Tag Data V Tag Data 1...1 mem[x...24] 1... mem[x...4] Set 3 Set 2 Set 1 Set 29

N-way Set Associative Performance # MIPS assembly code addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, x24($) addi $t, $t, -1 j loop done: Miss Rate = 2/1 = 2% Associativity reduces conflict misses Way 1 Way V Tag Data V Tag Data 1...1 mem[x...24] 1... mem[x...4] Set 3 Set 2 Set 1 Set 3

Fully Associative Cache No conflict misses Expensive to build V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data 31

11 1 1 Carnegie Mellon Spatial Locality? Increase block size: Block size, b = 4 words C = 8 words Direct mapped (1 block per set) Number of blocks, B = C/b = 8/4 = 2 Memory Address Tag Set 27 2 Block Byte Offset Offset V Tag 27 Data 32 32 32 32 Set 1 Set = 32 Hit Data 32

11 1 1 Carnegie Mellon Direct Mapped Cache Performance addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, xc($) lw $t3, x8($) addi $t, $t, -1 j loop done: Miss Rate = Memory Address Tag Block Byte Set Offset Offset... 11 27 2 V Tag 1... mem[x...c] 27 Data mem[x...8] mem[x...4] mem[x...] 32 32 32 32 Set 1 Set = 32 Hit Data 33

11 1 1 Carnegie Mellon Direct Mapped Cache Performance addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, xc($) lw $t3, x8($) addi $t, $t, -1 j loop done: Miss Rate = 1/15 = 6.67% Larger blocks reduce compulsory misses through spatial locality Memory Address Tag Block Byte Set Offset Offset... 11 27 2 V Tag 1... mem[x...c] 27 Data mem[x...8] mem[x...4] mem[x...] 32 32 32 32 Set 1 Set = 32 Hit Data 34

Cache Organization Recap Main Parameters Capacity: C Block size: b Number of blocks in cache: B = C/b Number of blocks in a set: N Number of Sets: S = B/N Organization Number of Ways (N) Direct Mapped 1 B N-Way Set Associative 1 < N < B B / N Fully Associative B 1 Number of Sets (S = B/N) 35

Capacity Misses Cache is too small to hold all data of interest at one time If the cache is full and program tries to access data X that is not in cache, cache must evict data Y to make room for X Capacity miss occurs if program then tries to access Y again X will be placed in a particular set based on its address In a direct mapped cache, there is only one place to put X In an associative cache, there are multiple ways where X could go in the set. How to choose Y to minimize chance of needing it again? Least recently used (LRU) replacement: the least recently used block in a set is evicted when the cache is full. 36

Types of Misses Compulsory: first time data is accessed Capacity: cache too small to hold all data of interest Conflict: data of interest maps to same location in cache Miss penalty: time it takes to retrieve a block from lower level of hierarchy 37

LRU Replacement # MIPS assembly lw $t, x4($) lw $t1, x24($) lw $t2, x54($) V U Tag Data V Tag Data Set Number 3 (11) (a) 2 (1) 1 (1) () V U Tag Data V Tag Data Set Number 3 (11) (b) 2 (1) 1 (1) () 38

LRU Replacement # MIPS assembly lw $t, x4($) lw $t1, x24($) lw $t2, x54($) (a) Way 1 Way V U Tag Data V Tag Data 1...1 mem[x...24] 1... mem[x...4] Way 1 Way Set 3 (11) Set 2 (1) Set 1 (1) Set () (b) V U Tag Data V Tag 1 1...1 mem[x...24] 1...11 Data mem[x...54] Set 3 (11) Set 2 (1) Set 1 (1) Set () 39

Caching Summary What data is held in the cache? Recently used data (temporal locality) Nearby data (spatial locality, with larger block sizes) How is data found? Set is determined by address of data Word within block also determined by address of data In associative caches, data could be in one of several ways What data is replaced? Least-recently used way in the set 4

Miss Rate Data Bigger caches reduce capacity misses Greater associativity reduces conflict misses Adapted from Patterson & Hennessy, Computer Architecture: A Quantitative Approach 41

Miss Rate Data Bigger block size reduces compulsory misses Bigger block size increases conflict misses 42

Multilevel Caches Larger caches have lower miss rates, longer access times Expand the memory hierarchy to multiple levels of caches Level 1: small and fast (e.g. 16 KB, 1 cycle) Level 2: larger and slower (e.g. 256 KB, 2-6 cycles) Even more levels are possible 43

Intel Pentium III Die 44

Virtual Memory Gives the illusion of a bigger memory without the high cost of DRAM Main memory (DRAM) acts as cache for the hard disk 45

Speed Carnegie Mellon The Memory Hierarchy Technology cost / GB Access time Cache SRAM ~ $1, ~ 1 ns Main Memory Virtual Memory DRAM ~ $1 ~ 1 ns Hard Disk ~ $1 ~ 1,, ns Capacity Physical Memory: DRAM (Main Memory) Faster, Not so large, Expensive Virtual Memory: Hard disk Slow, Large, Cheap 46

The Hard Disk Magnetic Disks Read/Write Head 47

Virtual Memory Each program uses virtual addresses Entire virtual address space stored on a hard disk. Subset of virtual address data in DRAM CPU translates virtual addresses into physical addresses Data not in DRAM is fetched from the hard disk Each program has its own virtual to physical mapping Two programs can use the same virtual address for different data Programs don t need to be aware that others are running One program (or virus) can t corrupt the memory used by another This is called memory protection 48

Cache/Virtual Memory Analogues Cache Block Block Size Block Offset Miss Tag Virtual Memory Page Page Size Page Offset Page Fault Virtual Page Number 49

Virtual Memory Definitions Page size: amount of memory transferred from hard disk to DRAM at once Address translation: determining the physical address from the virtual address Page table: lookup table used to translate virtual addresses to physical addresses 5

Virtual and Physical Addresses Most accesses hit in physical memory But programs have the large capacity of virtual memory 51

Address Translation 52

Virtual Memory Example System: Virtual memory size: 2 GB = 2 31 bytes Physical memory size: 128 MB = 2 27 bytes Page size: 4 KB = 2 12 bytes 53

Virtual Memory Example System: Virtual memory size: 2 GB = 2 31 bytes Physical memory size: 128 MB = 2 27 bytes Page size: 4 KB = 2 12 bytes Organization: Virtual address: 31 bits Physical address: 27 bits Page offset: 12 bits # Virtual pages = 2 31 /2 12 = 2 19 (VPN = 19 bits) # Physical pages = 2 27 /2 12 = 2 15 (PPN = 15 bits) 54

Virtual Memory Example What is the physical address of virtual address x247c? VPN = x2 VPN x2 maps to PPN x7fff The lower 12 bits (page offset) is the same for virtual and physical addresses (x47c) Physical address = x7fff47c 55

How do we translate addresses? Page table Has entry for each virtual page Each entry has: Valid bit: whether the virtual page is located in physical memory (if not, it must be fetched from the hard disk) Physical page number: where the page is located 56

Page Table Example Virtual Address Virtual Page Number x2 19 Page Offset 47C 12 V 1 x 1 x7ffe 1 x1 1 x7fff Hit Physical Address Physical Page Number x7fff Page Table 15 12 47C 57

Page Table Carnegie Mellon Page Table Example 1 What is the physical address of virtual address x5f2? V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Hit 15 58

Page Table Carnegie Mellon Page Table Example 1 What is the physical address of virtual address x5f2? VPN = 5 Entry 5 in page table indicates VPN 5 is in physical page 1 Physical address is x1f2 Virtual Address Virtual Page Number x5 19 Page Offset F2 12 V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Hit Physical Address 15 12 x1 F2 59

Page Table Example 2 What is the physical address of virtual address x73e? V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Page Table Hit 15 6

Page Table Carnegie Mellon Page Table Example 2 What is the physical address of virtual address x73e? VPN = 7 Entry 7 in page table is invalid, so the page is not in physical memory The virtual page must be swapped into physical memory from disk Virtual Address Virtual Page Number x7 19 Page Offset 3E V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Hit 15 61

Page Table Challenges Page table is large usually located in physical memory Each load/store requires two main memory accesses: one for translation (page table read) one to access data (after translation) Cuts memory performance in half Unless we get clever 62

Translation Lookaside Buffer (TLB) Use a translation lookaside buffer (TLB) Small cache of most recent translations Reduces number of memory accesses required for most loads/stores from two to one 63

Translation Lookaside Buffer (TLB) Page table accesses have a lot of temporal locality Data accesses have temporal and spatial locality Large page size, so consecutive loads/stores likely to access same page TLB Small: accessed in < 1 cycle Typically 16-512 entries Fully associative > 99 % hit rates typical Reduces # of memory accesses for most loads and stores from 2 to 1 64

1 Carnegie Mellon Example Two-Entry TLB Virtual Address Virtual Page Number x2 19 Page Offset 47C 12 Entry 1 Entry V Virtual Page Number Physical Page Number 1 x7fffd x 1 x2 x7fff V Virtual Page Number Physical Page Number 19 15 19 15 TLB = = Hit 1 Hit Hit 1 Hit Physical Address 15 x7fff 47C 12 65

Memory Protection Multiple programs (processes) run at once Each process has its own page table Each process can use entire virtual address space without worrying about where other programs are A process can only access physical pages mapped in its page table can t overwrite memory from another process 66

Virtual Memory Summary Virtual memory increases capacity A subset of virtual pages are located in physical memory A page table maps virtual pages to physical pages this is called address translation A TLB speeds up address translation Using different page tables for different programs provides memory protection 67