Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Similar documents
Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Lecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1)

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Lecture 17: Virtual Memory, Large Caches. Today: virtual memory, shared/pvt caches, NUCA caches

Lecture 15: Virtual Memory and Large Caches. Today: TLB design and large cache design basics (Sections )

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

Memory Hierarchy. Slides contents from:

Lecture 11: Large Cache Design

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 24: Memory, VM, Multiproc

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Memory Hierarchy. Slides contents from:

Lecture 24: Virtual Memory, Multiprocessors

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

Marten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen. Department of Electrical & Computer Engineering University of Connecticut

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Introduction to OpenMP. Lecture 10: Caches

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James

Lecture 12: Large Cache Design. Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers

A Cache Hierarchy in a Computer System

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache

Cray XE6 Performance Workshop

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

Lecture 14: Large Cache Design II. Topics: Cache partitioning and replacement policies

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Advanced Computer Architecture

1/19/2009. Data Locality. Exploiting Locality: Caches

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

CS Computer Architecture

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

Page 1. Multilevel Memories (Improving performance using a little cash )

Cache Performance (H&P 5.3; 5.5; 5.6)

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Assignment 1 due Mon (Feb 4pm

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CS3350B Computer Architecture

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Lecture 15: Large Cache Design III. Topics: Replacement policies, prefetch, dead blocks, associativity, cache networks

Memory Management! Goals of this Lecture!

Lecture 7 - Memory Hierarchy-II

EC 513 Computer Architecture

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]

CSE502: Computer Architecture CSE 502: Computer Architecture

Portland State University ECE 587/687. Caches and Prefetching

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Computer Architecture Spring 2016

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

Memory Hierarchies && The New Bottleneck == Cache Conscious Data Access. Martin Grund

EE 4683/5683: COMPUTER ARCHITECTURE

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Caches. Samira Khan March 23, 2017

Cache Designs and Tricks. Kyle Eli, Chun-Lung Lim

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

Lecture-18 (Cache Optimizations) CS422-Spring

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Lecture 17: Memory Hierarchy: Cache Design

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Transcription:

Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1

Techniques to Reduce Cache Misses Victim caches Better replacement policies pseudo-lru, NRU Prefetching, cache compression 2

Victim Caches A direct-mapped cache suffers from misses because multiple pieces of data map to the same location The processor often tries to access data that it recently discarded all discards are placed in a small victim cache (4 or 8 entries) the victim cache is checked before going to L2 Can be viewed as additional associativity for a few sets that tend to have the most conflicts 3

Replacement Policies Pseudo-LRU: maintain a tree and keep track of which side of the tree was touched more recently; simple bit ops NRU: every block in a set has a bit; the bit is made zero when the block is touched; if all are zero, make all one; a block with bit set to 1 is evicted 4

Prefetching Hardware prefetching can be employed for any of the cache levels It can introduce cache pollution prefetched data is often placed in a separate prefetch buffer to avoid pollution this buffer must be looked up in parallel with the cache access Aggressive prefetching increases coverage, but leads to a reduction in accuracy wasted memory bandwidth Prefetches must be timely: they must be issued sufficiently in advance to hide the latency, but not too early (to avoid pollution and eviction before use) 5

Stream Buffers Simplest form of prefetch: on every miss, bring in multiple cache lines When you read the top of the queue, bring in the next line Stream buffer Sequential lines 6

Stride-Based Prefetching For each load, keep track of the last address accessed by the load and a possibly consistent stride FSM detects consistent stride and issues prefetches init incorrect correct steady correct incorrect (update stride) correct PC tag prev_addr stride state trans correct incorrect (update stride) no-pred incorrect (update stride) 7

Intel Montecito Cache Two cores, each with a private 12 MB L3 cache and 1 MB L2 Naffziger et al., Journal of Solid-State Circuits, 2006 8

Intel 80-Core Prototype Polaris Prototype chip with an entire die of SRAM cache stacked upon the cores 9

Shared Vs. Private Caches in Multi-Core What are the pros/cons to a shared L2 cache? P1 P2 P3 P4 P1 P2 P3 P4 L2 L2 L2 L2 L2 10

Shared Vs. Private Caches in Multi-Core Advantages of a shared cache: Space is dynamically allocated among cores No waste of space because of replication Potentially faster cache coherence (and easier to locate data on a miss) Advantages of a private cache: small L2 faster access time private bus to L2 less contention 11

UCA and NUCA The small-sized caches so far have all been uniform cache access: the latency for any access is a constant, no matter where data is found For a large multi-megabyte cache, it is expensive to limit access time by the worst case delay: hence, non-uniform cache architecture 12

Large NUCA Issues to be addressed for Non-Uniform Cache Access: Mapping CPU Migration Search Replication 13

Shared NUCA Cache Core 0 D$ I$ D$ Core 1 I$ D$ Core 2 I$ D$ Core 3 I$ A single tile composed of a core, caches, and a bank (slice) of the shared L2 cache L2 $ L2 $ L2 $ L2 $ D$ Core 4 I$ L2 $ D$ Core 5 I$ L2 $ D$ Core 6 I$ L2 $ D$ Core 7 I$ L2 $ The cache controller forwards address requests to the appropriate L2 bank and handles coherence operations Memory Controller for off-chip access

Virtual Memory Processes deal with virtual memory they have the illusion that a very large address space is available to them There is only a limited amount of physical memory that is shared by all processes a process places part of its virtual memory in this physical memory and the rest is stored on disk Thanks to locality, disk access is likely to be uncommon The hardware ensures that one process cannot access the memory of a different process 15

Address Translation The virtual and physical memory are broken up into pages 8KB page size Virtual address Translated to phys page number virtual page number 13 page offset Physical address physical page number 13 page offset Physical memory 16

Title Bullet 17