SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

Similar documents
Virtual Memory, Address Translation

Virtual Memory, Address Translation

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

1. Creates the illusion of an address space much larger than the physical memory

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

ADDRESS TRANSLATION AND TLB

ADDRESS TRANSLATION AND TLB

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Computer Science 146. Computer Architecture

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Virtual Memory

Cache Performance (H&P 5.3; 5.5; 5.6)

Virtual Memory Nov 9, 2009"

Virtual Memory. Samira Khan Apr 27, 2017

ECE232: Hardware Organization and Design

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

CS3350B Computer Architecture

Virtual Memory Oct. 29, 2002

LECTURE 12. Virtual Memory

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

COSC3330 Computer Architecture Lecture 20. Virtual Memory

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Main Memory (II) Operating Systems. Autumn CS4023

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

virtual memory Page 1 CSE 361S Disk Disk

www-inst.eecs.berkeley.edu/~cs61c/

@2010 Badri Computer Architecture Assembly II. Virtual Memory. Topics (Chapter 9) Motivations for VM Address translation

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Memory Management. Dr. Yingwu Zhu

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Lecture 13: Virtual Memory Management. CSC 469H1F Fall 2006 Angela Demke Brown

Virtual Memory. CS 3410 Computer System Organization & Programming

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

VIRTUAL MEMORY II. Jo, Heeseung

V. Primary & Secondary Memory!

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Computer Systems. Virtual Memory. Han, Hwansoo

EN1640: Design of Computing Systems Topic 06: Memory System

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

Modern Computer Architecture

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Pipelined processors and Hazards

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

Memory Hierarchies 2009 DAT105

Handout 4 Memory Hierarchy

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Virtual Memory. Motivation:

CISC 360. Virtual Memory Dec. 4, 2008

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Main Memory (Fig. 7.13) Main Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lec 11 How to improve cache performance

Virtual to physical address translation

ECE331: Hardware Organization and Design

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner

Chapter 8 Memory Management

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Virtual Memory - Objectives

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

This Unit: Main Memory. Virtual Memory. Virtual Memory. Other Uses of Virtual Memory

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Background. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.

CS370 Operating Systems

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

UCB CS61C : Machine Structures

Topic 18: Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

CS 153 Design of Operating Systems Winter 2016

Tutorial 11. Final Exam Review

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

Virtual memory why? Virtual memory parameters Compared to first-level cache Parameter First-level cache Virtual memory. Virtual memory concepts

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Virtual Memory. User memory model so far:! In reality they share the same memory space! Separate Instruction and Data memory!!

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Processes and Virtual Memory Concepts

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

CPSC 330 Computer Organization

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Virtual Memory: From Address Translation to Demand Paging

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

Transcription:

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan govind@serc Reality Check Question 1: Are real caches built to work on virtual addresses or physical addresses? Question 2: What about multiple levels in caches? Question 3: Do modern processors use pipelining of the kind that we studied? 2 1

Virtual Memory System To support memory management when multiple processes are running concurrently. Page based, segment based,... Ensures protection across processes. Address Space: range of memory addresses a process can address (includes program (text), data, heap, and stack) 32-bit address 4 GB with VM, address generated by the processor is virtual address 3 Page-Based Virtual Memory A process address space is divided into a no. of pages (of fixed size). A page is the basic unit of transfer between secondary storage and main memory. Different processes share the physical memory Virtual address to be translated to physical address. 4 2

Virtual Pages to Physical Frame Mapping Process 1 Main Memory Process k 5 Page Mapping Info (Page Table) A page is mapped to any frame in the main memory. Where to store/access the mapping? Page Table Each process will have its own page table! Address Translation: virtual to physical address translation 6 3

Address Translation Virtual address (issued by processor) to Physical Address (used to access memory) Virtual Address Virtual Page No. 18-bits Offset 14 bits Phy. Page # Pr V D Page Table Physical Frame No. 18-bits Physical Address Offset 14 bits 7 Memory Hierarchy: Secondary to Main Memory Analogous to Main memory to Cache. When the required virtual page is not in main memory: Page Hit When the required virtual page is not in main memory: Page fault Page fault penalty very high (~10 s of msecs.) as it involves disk (secondary storage) access. Page fault ratio shd. be very low ( to ) Page fault handled by OS. 8 4

Page Placement A virtual page is placed anywhere in physical memory (fully associative). Page Table keeps track of the page mapping. Separate page table for each process. Page table size is quite large! Assume 32-bit address space and 16KB page size. # of entries in page table = 2 32 / 2 14 = 2 18 = 256K Page table size = 256 K * 4B = 1 MB = 64 pages! Page table itself may be paged (multi-level page tables)! 9 Page Identification Use virtual page number to index into page table. Accessing page table causes one extra memory access! Virtual Page No. 18-bits Offset 14 bits Phy. Page # Pr V D Physical Frame No. 18-bits Offset 14 bits 10 5

Page Replacement Page replacement can use more sophisticated policies than in Cache. Least Recently Used Second-chance algorithm Recency vs. frequency Write Policies Write-back Write-allocate 11 Translation Look-Aside Buffer Accessing the page tables causes one extra memory access! To reduce translation time, use translation lookaside buffer (TLB) which caches recent address translations. TLB organization similar to cache orgn. (direct mapped, set-, or full-associative). Size of TLB is small (4 512 entries). TLBs is important for fast translation. 12 6

Translation using TLB Assume 128 entry 4-way associative TLB 18 bits 14 bits Virtual Page No. Offset Tag 13bits Ind. 5bits D V Pr Tag Phy. Page # P. P. # Pr V D = TLB Hit Physical Frame No. Offset 13 Q1: Caches and Address Translation Physical Addressed Cache Virtual Address MMU Physical Address Cache Virtual Address Cache Virtual Address MMU Physical Address (if cache miss) (to main memory) Virtual Addressed Cache 14 7

Which is less preferable? Physical addressed cache Hit time higher (cache access after translation) Virtual addressed cache Data/instruction of different processes with same virtual address in cache at the same time Flush cache on context switch, or Include Process id as part of each cache directory entry Synonyms Virtual addresses that translate to same physical address More than one copy in cache can lead to a data consistency problem 15 Another possibility: Overlapped operation MMU Virtual Address Physical Address Indexing into cache directory using virtual address Cache Tag comparison using physical address Virtual indexed physical tagged cache 16 8

Addresses and Caches `Physical Addressed Cache Physical Indexed Physical Tagged `Virtual Addressed Cache Virtual Indexed Virtual Tagged Overlapped cache indexing and translation Virtual Indexed Physical Tagged Physical Indexed Virtual Tagged (?) 17 Physical Indexed Physical Tagged Cache 16KB page size 64KB direct mapped cache with 32B block size Virtual Page No 18 bits Page Offset 14 bits Virtual Address MMU = Physical Cache 5 Physical Cache Page Tag No 16 18 bits C-Index Page OffsetC 11 14 bits bits offset Physical Address 18 9

Virtual Index Virtual Tagged Cache VPN 18 bits C-Index 11 bits 5 C offset Virtual Address = MMU Hit/Miss PPN 18 bits Page Offset 14 bits Physical Address 19 Virtual Index Physical Tagged Cache VPN 18 bits C-Index 11 bits 5 C offset Virtual Address = MMU Cache Tag 16 bits P-Offset 14 bits Physical Address 20 10

Multi-Level Caches Small L1 cache -- to give a low hit time, and hence faster CPU cycle time. Large L2 cache to reduce L1 cache miss penalty. L2 cache is typically set-associative to reduce L2- cache miss ratio! Typically, L1 cache is direct mapped, separate I and D cache orgn. L2 is unified and set-associative. L1 and L2 are on-chip; L3 is also getting in on-chip. 21 Multi-Level Caches CPU MMU L1 I-Cache L1 D-Cache Acc. Time 2-4 ns L2 Unified Cache Memory Acc. Time 16-30 ns Acc. Time 100 ns 22 11

Cache Performance One Level Cache AMAT = Hit Time L1 + Miss Rate L1 x Miss Penalty L1 Two-Level Caches AMAT = Hit Time L1 + Miss Rate L1 x Miss Penalty L1 Miss Penalty L1 = Hit Time L2 + Miss Rate L2 x Miss Penalty L2 AMAT = Hit Time L1 + Miss Rate L1 x (Hit Time L2 + Miss Rate L2 + Miss Penalty L2 ) 23 Putting it Together: Alpha 21264 48-bit virtual addr. and 44-bit physical address. 64KB 2-way assoc. L1 I- Cache with 64byte blocks ( 512 sets) L1 I-Cache is virtually index and tagged (address translation reqd. only on a miss) 8-bit AS for each process (to avoid cache flush on context switch) 24 12

Alpha 21264 8KB page size ( 13 bit page offset) 128 entry fully associative TLB 8MB Direct mapped unified L2 cache, 64B block size Critical word (16B) first Prefetch next 64B into instrn. prefetcher 25 21264 Data Cache L1 data cache uses virtual addr. index, but physical addr. tag Addr. translation along with cache access. 64KB 2-way assoc. L1 data cache with write back. 26 13

Q2: High Performance Pipelined Processors Pipelining Overlaps execution of consecutive instructions Performance of processor improves Current processors use more aggressive techniques for more performance Some exploit Instruction Level Parallelism - often, many consecutive instructions are independent of each other and can be executed in parallel (at the same time) 27 Instruction Level Parallelism Processors Challenge: identifying which instructions are independent Approach 1: build processor hardware to analyze and keep track of dependences Superscalar processors: Pentium 4, RS6000, Approach 2: compiler does analysis and packs suitable instructions together for parallel execution by processor VLIW (very long instruction word) processors: Intel Itanium 28 14

ILP Processors (contd.) MEM WB MEM WB MEM WB Pipelined MEM WB MEM WB MEM MEM WB WB Superscalar MEM WB MEM WB VLIW/EPIC 29 Multicores Multiple cores in a single die Early efforts utilized multiple cores for multiple programs Throughput oriented rather than speeduporiented! Can they be used by Parallel Programs? 30 15