Computer Structure. X86 Virtual Memory and TLB

Similar documents
Virtual Memory. CS 351: Systems Programming Michael Saelee

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

2-Level Page Tables. Virtual Address Space: 2 32 bytes. Offset or Displacement field in VA: 12 bits

Virtual Memory. Virtual Memory

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

ECE232: Hardware Organization and Design

Virtual Memory. Samira Khan Apr 27, 2017

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Systems Programming and Computer Architecture ( ) Timothy Roscoe

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

Virtual Memory Nov 9, 2009"

Virtual Memory: From Address Translation to Demand Paging

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

CS162 - Operating Systems and Systems Programming. Address Translation => Paging"

virtual memory Page 1 CSE 361S Disk Disk

Administrivia. Lab 1 due Friday 12pm. We give will give short extensions to groups that run into trouble. But us:

Computer Architecture Lecture 13: Virtual Memory II

Virtual Memory: From Address Translation to Demand Paging

Pentium/Linux Memory System March 17, 2005

Memory System Case Studies Oct. 13, 2008

Virtual Memory. User memory model so far:! In reality they share the same memory space! Separate Instruction and Data memory!!

Virtual Memory - Objectives

Virtual memory Paging

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

UC Berkeley CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory. Bernhard Boser & Randy Katz

Virtual Memory Oct. 29, 2002

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Virtual Memory. Motivation:

P6/Linux Memory System Nov 11, 2009"

Processes and Virtual Memory Concepts

A Few Problems with Physical Addressing. Virtual Memory Process Abstraction, Part 2: Private Address Space

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

Module 2: Virtual Memory and Caches Lecture 3: Virtual Memory and Caches. The Lecture Contains:

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements

Computer Systems. Virtual Memory. Han, Hwansoo

Virtual Memory: Systems

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

MEMORY MANAGEMENT UNITS

Virtual Memory. CS 3410 Computer System Organization & Programming

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019

CISC 360. Virtual Memory Dec. 4, 2008

Lecture 19: Virtual Memory: Concepts

ADDRESS TRANSLATION AND TLB

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 24

ADDRESS TRANSLATION AND TLB

Modern Virtual Memory Systems. Modern Virtual Memory Systems

COSC3330 Computer Architecture Lecture 20. Virtual Memory

Virtual Memory. Physical Addressing. Problem 2: Capacity. Problem 1: Memory Management 11/20/15

@2010 Badri Computer Architecture Assembly II. Virtual Memory. Topics (Chapter 9) Motivations for VM Address translation

CSE 351. Virtual Memory

CS 318 Principles of Operating Systems

CSE Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100

Virtual Memory: Concepts

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

UCB CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

VIRTUAL MEMORY II. Jo, Heeseung

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

Chapter 8 Memory Management

18-447: Computer Architecture Lecture 18: Virtual Memory III. Yoongu Kim Carnegie Mellon University Spring 2013, 3/1

Memory Management. Goals of Memory Management. Mechanism. Policies

John Wawrzynek & Nick Weaver

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Topic 18: Virtual Memory

ECE468 Computer Organization and Architecture. Virtual Memory

Learning to Play Well With Others

A Typical Memory Hierarchy

ECE4680 Computer Organization and Architecture. Virtual Memory

Computer Architecture. Lecture 8: Virtual Memory

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

ECE 571 Advanced Microprocessor-Based Design Lecture 12

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory 2. q Handling bigger address spaces q Speeding translation

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

Virtual Memory 2. To do. q Handling bigger address spaces q Speeding translation

CS152 Computer Architecture and Engineering. Lecture 15 Virtual Memory Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

143A: Principles of Operating Systems. Lecture 6: Address translation. Anton Burtsev January, 2017

CSE 153 Design of Operating Systems

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

Guerrilla 7: Virtual Memory, ECC, IO

CS 550 Operating Systems Spring Memory Management: Paging

This Unit: Main Memory. Virtual Memory. Virtual Memory. Other Uses of Virtual Memory

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

CS 153 Design of Operating Systems Winter 2016

Transcription:

Computer Structure X86 Virtual Memory and TLB Franck Sala Slides from Lihu and Adi s Lecture 1

Virtual Memory Provides the illusion of a large memory Different machines have different amount of physical memory Allows programs to run regardless of actual physical memory size The amount of memory consumed by each process is dynamic Allow adding memory as needed Many processes can run on a single machine Provide each process its own memory space Prevents a process from accessing the memory of other processes running on the same machine Allows the sum of memory spaces of all process to be larger than physical memory Basic terminology Virtual Address Space: address space used by the programmer Physical Address: actual physical memory address space 2

Virtual Memory: Basic Idea Divide memory (virtual and physical) into fixed size blocks s in Virtual space, Frames in Physical space size = Frame size size is a power of 2: page size = 2 k All pages in the virtual address space are contiguous s can be mapped into physical Frames in any order Some of the pages are in main memory (DRAM), some of the pages are on disk All programs are written using Virtual Memory Address Space The hardware does on-the-fly translation between virtual and physical address spaces Use a Table to translate between Virtual and Physical addresses 3

Virtual to Physical Address translation 47 Virtual Number Virtual Address 12 11 offset 0 Dirty bit table base reg V D AC Phy. page # 1 0 Access Control - Memory type (WB, WT, UC, WP ) - User / Supervisor Valid bit 38 Physical Number Physical Address 12 11 0 offset size: 2 12 byte =4K byte 4

Translation Look aside Buffer (TLB) table resides in memory each translation requires an extra memory access Virtual Address TLB caches recently used PTEs TLB Access speed up translation typically 128 to 256 entries, 4 to 8 way associative TLB Indexing Access Table In memory No TLB Hit? Virtual page number Offset Yes On A TLB miss Tag Set Miss Handler (HW PMH) gets PTE from memory Physical Addresses 5

Virtual Memory And Cache Virtual Address Access TLB Access Cache Walk: get PTE from memory Hierarchy No STLB Hit? No TLB Hit? Yes L1 Cache Hit? Yes No L2 Cache Hit? No Access Memory Physical Addresses Data TLB access is serial with cache access table entries are cached in L1 D$, L2$ and L3 $ as data 6

Fault is not in main memory Context switch Demand load policy Glossary Access control / Protection fault Context switch Aliasing DLL, Shared code, large Malloc swap out snoop-invalidate for each byte in the cache (both L1 and L2) All data in the caches which belongs to that page is invalidated The page in the disk is up-to-date If the TLB hits for the swapped-out page, TLB entry is invalidated In the page table The valid bit in the PTE entry of the swapped-out pages set to 0 All the rest of the bits in the PTE entry may be used by the operating system for keeping the location of the page in the disk 7

32bit Mode: 4KB / 4MB Mapping 2-level hierarchical mapping: Directories and tables 4KB aligned PDE Present (0 = page fault) size (4KB or 4 MB) Linear Address Space (4K ) 31 21 11 0 DIR TABLE OFFSET CR4.PSE=1 both 4MB & 4KB pages supported Separate TLBs Linear Address Space (4MB ) 31 21 0 DIR OFFSET 10 10 12 1K entry Directory 1K entry Table 4K data 10 22 4MByte data PTE 20 Directory PDE 20 20+12=32 (4K aligned) PDE 10 20+12=32 (4K aligned) CR3 (PDBR) 8 CR3 (PDBR)

32bit Mode: PDE and PTE Format 20 bit pointer to a 4K Aligned address Virtual memory Present Accessed Dirty (in PTE only) size (in PDE only) Global Protection Writable (R#/W) User / Supervisor # Caching 2 levels/type only WT Cache Disabled PAT PT Attribute Index 3 bits available for OS usage Present Writable User / Supervisor Write-Through Cache Disable Accessed Size (0: 4 Kbyte) Global Available for OS Use Directory Entry (4KB page table) Frame Address 31:12 AVAIL G 0 A A 31 12 11-9 8 7 6 5 Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty PAT Global Available for OS Use Frame Address 31:12 Table Entry AVAIL G P A T D A 31 12 11-9 8 7 6 5 P C D P W U W P T 4 3 2 1 0 P C D P W U W P T 4 3 2 1 0 9

4KB Mapping with PAE Physical addresses is extended to M bits / Linear address remains 32 bit 1995: Pentium Pro Linear Address Space (4K ) 31 21 11 0 DIR TABLE OFFSET Linear Address Space (4K ) 31 3029 Dir ptr DIR 21 20 1211 TABLE OFFSET 0 10 10 12 4K 2 9 9 12 4KByte 1K entry Directory 32 PDE 20 1K entry Table PTE 32 20+12=32 (4K aligned) 20 data 4 entry Directory Pointer Table Dir ptr entry M-12 512 entry Directory 64 PDE M-12 512 entry Table PTE 64 M-12 data 32 (32B aligned) CR3 (PDBR) CR3 (PDPTR) 10

2MB Mapping with PAE Physical addresses is extended to M bits / Linear address remains 32 bit Linear Address Space (4MB ) 31 DIR 21 OFFSET 10 22 Directory 32 PDE 10 20+12=32 (4K aligned) 0 4MByte data 32 Linear Address Space (2MB ) 31 30 Dir ptr 2 29 Directory Pointer Table Dir ptr entry DIR 32 (32B aligned) 21 20 Directory PDE OFFSET 9 21 M-12 64 M-21 0 2MByte data 64 CR3 (PDBR) CR3 (PDPTR) 11

4KB Mapping in 64 bit Mode 2003: AMD Opteron 63 sign ext. Linear Address Space (4K ) 47 39 38 30 29 21 20 12 11 0 PML4 PDP DIR TABLE OFFSET PML4: Map Level 4 PDP: Directory Pointer 9 9 9 9 12 4KByte 512 entry PML4 Table PML4 entry M-12 40 (4KB aligned) CR3 (PDPTR) 512 entry Directory Pointer Table 64 PDP entry M-12 512 entry Directory 64 PDE M-12 512 entry Table PTE 64 M-12 data 64 256 TB of virtual memory (2 48 ) 1 TB of physical memory (2 40 ) 12

2MB Mapping in 64 bit Mode Linear Address Space (2M ) 63 47 39 38 30 29 21 20 0 sign ext. PML4 PDP DIR OFFSET 9 9 9 21 512 entry PML4 Table 512 entry Directory Pointer Table 512 entry Directory 2MByte data PML4 entry M-12 PDP entry M-12 PDE M-21 40 (4KB aligned) CR3 (PDPTR) 13

1GB Mapping in 64 bit Mode 63 sign ext. Linear Address Space (1G ) 47 39 38 30 29 PML4 PDP OFFSET 0 9 9 30 512 entry PML4 Table 512 entry Directory Pointer Table 1GByte data PML4 entry M-12 40 (4KB aligned) PDP entry M-30 CR3 (PDPTR) 14

TLBs The processor saves most recently used PDEs and PTEs in TLBs Separate TLB for data and instruction caches Separate TLBs for 4KB and 2/4MB page sizes 15

We have a core similar to X86 64 bit mode Question 1 Support Small s (PTE) and Large s (DIR) table size in each hierarchy is the size of a small page Entry size in the Table is 16 byte, in all the hierarchies 63 N4 N3 N2 N1 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET What is the size of a small page? 12 bits in the offset field 2 12 B = 4KB How many entries are in each Table? Table size = Size = 4KB PTE = 16B 4KB / 16B = 2 12 / 2 4 = 2 8 = 256 entries in each Table 16

- 64 bit (large & small) - PT size = size = 4KB - PTE = 16B - Table: 256 entries Question 1 63 N4 N3 N2 N1 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET What are the values of N1, N2, N3 and N4? Since we have 256 entries in each table, we need 8 bits to address them - Table [19:12] N1 = 19 - DIR [27:20] N2 = 27 - PDP [35:28] N3 = 35 - PML4 [43:36] N4 = 43 What is the size of a large page? Large pages are pointed by DIR So the large pages offset is 20 bits [19:0] large pages size: 2 20 = 1MB We can also say: DIR can point to 256 pages of 4KB = 1MB 17

- 64 bit (large & small) - PT size = size = 4KB - Large page size = 1 MB - PTE = 16B - Table: 256 entries Question 1 63 43 35 27 19 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET We access a sequence of virtual addresses For each address, what is the minimal number of tables that were added in all the hierarchies? See next foil in presentation mode 18

Question 1: sequence of allocations 63 43 36 35 28 27 20 19 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET 8 bits instead of 9 for example purposes only D3 71 PML4 Table 8 82 PDP Table 8 FF 28 Dir 8 8 12 E2 68 Table B25 349 937 4KB PML4 PDP DIR PTE offset 00000 D3 82 FF E2 B25 3 new tables + 1 page 00000 D3 82 FF E2 349 0 new table 00000 D3 82 FF 68 937 0 new table + 1 page CR3 46 49 C00 00000 D3 82 28 49 C00 1 new tables + 1 page B5 54 622 00000 71 46 B5 54 622 3 new tables + 1 page 19

Caches and Translation Structures Core Instruction bytes On-die Platform L1 Inst. cache L1 data cache L2 L3 Memory translation Inst. TLB Data. TLB PTE PTE Load entry STLB PTE VA[47:12] PMH PDE cache PDP cache PML4 cache PDE entry VA[47:21] PDP entry VA[47:30] PML4 entry VA[47:39] Walk Logic 20

48 39 30 21 12 0 SIGN EXT. PML4 PDP DIR TABLE OFFSET cache Accessed with virtual address bits If hits, returns PDE cache [47:21] PDE PDP cache [47:30] PDP entry PML4 cache [47:39] PML4 entry 21

Question 2 Processor similar to X86 64 bits s of 4KB The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Walk The hardware that does the Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset 63 43 35 27 19 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET For the sequence of memory access below, how many accesses are needed for the translations? Address 0000122334455666H memory access Explanations 0000122334455777H 0000122884455777H 22

Question 2 Processor similar to X86 64 bits s of 4KB The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Walk The hardware that does the Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset 63 43 35 27 19 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET For the sequence of memory access below, how many accesses are needed for the translations? Address 0000122334455666H memory access Explanations 0000122334455777H 0000122884455777H 23

Question 2 Processor similar to X86 64 bits s of 4KB The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Walk The hardware that does the Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset 63 43 35 27 19 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET For the sequence of memory access below, how many accesses are needed for the translations? Address memory access Explanations 0000122334455666H 4 We need to access the memory for each of the 4 translation tables 0000122334455777H 0000122884455777H 24

Question 2 Processor similar to X86 64 bits s of 4KB The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Walk The hardware that does the Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset 63 43 35 27 19 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET For the sequence of memory access below, how many accesses are needed for the translations? Address memory access Explanations 0000122334455666H 4 We need to access the memory for each of the 4 translation tables 0000122334455777H 0 Same pages as above TLB hit No memory access 0000122884455777H 25

Question 2 Processor similar to X86 64 bits s of 4KB The processor has a TLB TLB Hit: we get the translation with no need to access the translation tables TLB Miss: the processor has to do a Walk The hardware that does the Walk (PMH) contains a cache for each of the translation tables All Caches and TLB are empty on Reset 63 43 35 27 19 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET For the sequence of memory access below, how many accesses are needed for the translations? Address memory access Explanations 0000122334455666H 4 We need to access the memory for each of the 4 translation tables 0000122334455777H 0 0000122884455777H 3 26 Same pages as above TLB hit No memory access We hit in PML4 cache. Then we miss in the PDP and so we need to access the memory 3 times: PDP, DIR, PTE

Question 2 L1 data Cache: 32KB 2 ways of 64B each How can we access this cache before we get the physical address? 64B 6 bits offset bits [5:0] 32KB = 2 15 / (2 ways * 2 6 bytes) = 2 8 = 256 sets [13:6] 12 bits are not translated: [11:0] we lack 2 bits [13:12] to get the set address So we do a lookup using 2 un-translated bits for the set address Those bits can be different from the PFN obtained after translation, therefore we need to compare the whole PFN to the tag stored in the Cache Tag array 27

Question 2: Read Acces Translated Not Translated VPN Set Offset 47 13 12 11 6 5 0 40 13 12 PFN Tag [40:12] Tag [40:12] Way 0 Way 1 40:12 = = Tag Match Tag Match 28

Translated VPN Set Not Translated Offset Question 2: example 47 13 12 11 6 5 0 Write Virtual Address A VPN Offset PFN Offset VA: 0000 0000 0000 0000 0000 PA: 0 0000 0000 0000 0000 Set: [13:6] = 0 Tag: 0 Read Virtual Address B VB: 0011 0000 0000 0000 0000 PB: 0 0011 0000 0000 0000 Set: [13:6] = 0 Tag: 3 (0 if we don t take [13:12]) 29

L1 data Cache: 32KB 2 ways of 64B each What will happen when we access with a given offset the virtual page A and after this, there is an access with the same offset in the virtual page B, which is mapped by the OS to the same physical page as A? Question 2: Virtual Alias 47 Translated VPN Set 13 12 11 40 13 12 PFN Not Translated 6 5 Offset 0 VPN xxxx01 VPN yyyy00 PFN zzzz Set[11:6] Not translated Max: 64 sets Physical Addresses Cache zzzz 01.set[11:6] 00.set[11:6] Physical Addresses Cache zzzz zzzz - 2 virtual pages map to the same frame xxxx01.set.ofset and yyyy00.set.ofset 01 and 00 are bits 13:12 Phys. addressed cache: The data exist only once Virtual addressed cache: The data may exist twice AVOID THIS!!! 30

Question 2: Virtual Alias L1 data Cache: 32KB 2 ways of 64B each What will happen when we access with a given offset the virtual page A and after this, there is an access with the same offset in the virtual page B, which is mapped by the OS to the same physical page as A? 47 Translated VPN Set 13 12 11 40 13 12 PFN Not Translated Offset 6 5 0 Avoid having the same data twice in the cache xxxx01.set.ofset and yyyy00.set.ofset 01.set[11:6] Physical Addresses Cache zzzz Check 4 sets when we allocate a new entry and see if the same tag appears If yes, evict the second occurrence of the data (the alias) 31 00.set[11:6] zzzz Virtual addressed cache: The data may exist twice AVOID THIS!!!

Question 2: Snoop L1 data Cache: 32KB 2 ways of 64B each What happens in case of snoop in the cache? 47 Translated VPN Set 13 12 11 Not Translated Offset 6 5 0 40 13 12 PFN The cache is snooped with a physical address [40:0] Since the 2 MSB bits of the set address are virtual, a given physical address can map to 4 different sets in the cache (depending on the virtual page that is mapped to it) So we must snoop 4 sets * 2 ways in the cache 01.set[11:6] 00.set[11:6] Physical Addresses Cache zzzz zzzz Virtual addressed cache: The data may exist twice AVOID THIS!!! 32

Core similar to X86 in 64 bit mode Question 3 63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET Supports small pages (pointed by PTE) and large pages (pointed by DIR). Size of an entry in all the different page tables is 8 Bytes PMH Caches at all the levels 4 entries direct mapped Access time on hit: 2 cycles Miss known after 1 cycle PMH caches are accessed at all the levels in parallel In each level, when there is a HIT, the PMH cache provides the relevant entry in the page table in the relevant level In each level, when there is a miss: the core accesses the relevant page table in the main memory. Access time to the main memory is 100 cycles, not including the time needed to get the PMH cache miss. 33

Question 3 63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET What is the size of the large pages? The large page is pointed by DIR, therefore, all the bits under are offset inside the large page: 2 24 = 16 MB How many entries in each Table? PTE: 2 12 DIR: 2 12 PDP: 2 12 PML4: 2 8 34

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET Virtual Addr. Cycles Comment TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle FF81 2345 6789 ABCD FF81 2340 6789 ABCD FF80 2340 6789 ABCD FF81 2340 6709 ABCD FF81 2340 6709 A0CD 35

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET Virtual Addr. Cycles Comment TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle FF81 2345 6789 ABCD 401 Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD FF80 2340 6789 ABCD FF81 2340 6709 ABCD FF81 2340 6709 A0CD 36

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET Virtual Addr. Cycles Comment TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle FF81 2345 6789 ABCD 401 Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD 202 (PML4, PDP, DIR, stlb) = (H,H,M,M) 1cyc PDP TLB read 1 more cycle DIR TLB: Miss 100 cycles PTE TLB: Miss 100 cycles FF80 2340 6789 ABCD FF81 2340 6709 ABCD FF81 2340 6709 A0CD 37

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET Virtual Addr. Cycles Comment TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle FF81 2345 6789 ABCD 401 Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD 202 (PML4, PDP, DIR, stlb) = (H,H,M,M) 1cyc PDP TLB read 1 more cycle DIR TLB: Miss 100 cycles PTE TLB: Miss 100 cycles FF80 2340 6789 ABCD 401 PMH cache miss in all the levels: 1 + 4*100 = 401 cycles FF81 2340 6709 ABCD FF81 2340 6709 A0CD 38

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET Virtual Addr. Cycles Comment TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle FF81 2345 6789 ABCD 401 Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD 202 (PML4, PDP, DIR, stlb) = (H,H,M,M) 1cyc PDP TLB read 1 more cycle DIR TLB: Miss 100 cycles PTE TLB: Miss 100 cycles FF80 2340 6789 ABCD 401 PMH cache miss in all the levels: 1 + 4*100 = 401 cycles FF81 2340 6709 ABCD 302 (PML4, PDP, DIR, stlb) = (H,M,M,M) 1 cyc PDP: the entry 234 that was filled for the second access was replaced by the entry that was filled in access 3, as it is in the same set miss 2 + (3 100) = 302 FF81 2340 6709 A0CD 39

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET Virtual Addr. Cycles Comment TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle FF81 2345 6789 ABCD 401 Miss and memory access at each level: 1+ 4 *100 = 401 cycles FF81 2340 6789 ABCD 202 (PML4, PDP, DIR, stlb) = (H,H,M,M) 1cyc PDP TLB read 1 more cycle DIR TLB: Miss 100 cycles PTE TLB: Miss 100 cycles FF80 2340 6789 ABCD 401 PMH cache miss in all the levels: 1 + 4*100 = 401 cycles FF81 2340 6709 ABCD 302 (PML4, PDP, DIR, stlb) = (H,M,M,M) 1 cyc PDP: the entry 234 that was filled for the second access was replaced by the entry that was filled in access 3, as it is in the same set miss 2 + (3 100) = 302 FF81 2340 6709 A0CD 2 Hit in TLB no need to go to PMH: 2 cycles 40

Backup Slides 41

Translated not translated 47 12 11 0 Virtual Address Set Offset 13 6 5 0 VPN 47 13 Tag field 40 : 11 40 11 PFN Way 0 Way 3 = = Tag Match Tag Match 42

Translated not translated Virtual Address Set Offset VPN Tag field : 40 PFN Way 0 Way 3 = = Tag Match Tag Match 43

Question 3 63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle We have a sequence of accesses to virtual addresses. For each address, how many cycles are needed to translate the address. Assume that we use small pages only and that the TLBs are empty upon reset Virtual Addr. Cycles Comment FF81 2345 6789 ABCD FF81 2340 6789 ABCD FF80 2340 6789 ABCD FF81 2340 6709 ABCD FF81 2340 6709 A0CD 44

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle Virtual Addr. Cycles Comment FF81 2345 6789 ABCD 404 Miss and memory access at each level: 4 *101 = 404 cycles FF81 2340 6789 ABCD FF80 2340 6789 ABCD FF81 2340 6709 ABCD FF81 2340 6709 A0CD 45

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle Virtual Addr. Cycles Comment FF81 2345 6789 ABCD 404 Miss and memory access at each level: 4 *101 = 404 cycles FF81 2340 6789 ABCD 206 PML4 TLB: Hit 2 cycles PDP TLB: Hit 2 cycles DIR TLB: Miss 101 cycles PTE TLB: Miss 101 cycles FF80 2340 6789 ABCD FF81 2340 6709 ABCD FF81 2340 6709 A0CD 46

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle Virtual Addr. Cycles Comment FF81 2345 6789 ABCD 404 Miss and memory access at each level: 4 *101 = 404 cycles FF81 2340 6789 ABCD 206 PML4 TLB: Hit 2 cycles PDP TLB: Hit 2 cycles DIR TLB: Hit 101 cycles PTE TLB: Hit 101 cycles FF80 2340 6789 ABCD 404 FF81 2340 6709 ABCD FF81 2340 6709 A0CD TLB miss in all the levels: 4 *101 = 404 cycles 47

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET Virtual Addr. Cycles Comment TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle FF81 2345 6789 ABCD 404 Miss and memory access at each level: 4 *101 = 404 cycles FF81 2340 6789 ABCD 206 PML4 TLB: Hit 2 cycles PDP TLB: Hit 2 cycles DIR TLB: Hit 101 cycles PTE TLB: Hit 101 cycles FF80 2340 6789 ABCD 404 TLB miss in all the levels: 4 *101 = 404 cycles FF81 2340 6709 ABCD 305 PML4 TLB: 2 cycles PDP: the entry 234 that was filled for the second access was replaced by the entry that was filled in access 3, as it is in the same set miss DIR & PTE: miss (1 2) + (3 101) = 305 FF81 2340 6709 A0CD 48

63 55 47 35 23 12 11 0 sign ext. PML4 PDP DIR TABLE OFFSET Virtual Addr. Cycles Comment TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle FF81 2345 6789 ABCD 404 Miss and memory access at each level: 4 *101 = 404 cycles FF81 2340 6789 ABCD 206 PML4 TLB: Hit 2 cycles PDP TLB: Hit 2 cycles DIR TLB: Hit 101 cycles PTE TLB: Hit 101 cycles FF80 2340 6789 ABCD 404 TLB miss in all the levels: 4 *101 = 404 cycles FF81 2340 6709 ABCD 305 PML4 TLB: 2 cycles PDP: the entry 234 that was filled for the second access was replaced by the entry that was filled in access 3, as it is in the same set miss DIR & PTE: miss (1 2) + (3 101) = 305 FF81 2340 6709 A0CD 8 Hit in all TLBs: 4 * 2 = 8 49