Memory Hierarchy. Bojian Zheng CSCD70 Spring 2018

Similar documents
Introduction to OpenMP. Lecture 10: Caches

Cray XE6 Performance Workshop

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Computer Architecture Spring 2016

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache

Advanced optimizations of cache performance ( 2.2)

CSC D70: Compiler Optimization Memory Optimizations

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Memory Hierarchy. Slides contents from:

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

1/19/2009. Data Locality. Exploiting Locality: Caches

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

Cache Performance (H&P 5.3; 5.5; 5.6)

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary

Page 1. Multilevel Memories (Improving performance using a little cash )

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

ECE 571 Advanced Microprocessor-Based Design Lecture 10

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Announcements. ECE4750/CS4420 Computer Architecture L6: Advanced Memory Hierarchy. Edward Suh Computer Systems Laboratory

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Key Point. What are Cache lines

Memory Hierarchies &

Lecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1)

Computer Architecture Spring 2016

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Memories. CPE480/CS480/EE480, Spring Hank Dietz.

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Advanced Computer Architecture

CS222: Cache Performance Improvement

Memory Hierarchy Basics

Chapter 2 (cont) Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]

Lecture 7 - Memory Hierarchy-II

Memory Hierarchy. Advanced Optimizations. Slides contents from:

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Cache performance Outline

Caches. Samira Khan March 23, 2017

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

EITF20: Computer Architecture Part4.1.1: Cache - 2

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Memory Management! Goals of this Lecture!

Chapter 09: Caches. Lesson 04: Replacement policy

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Memory Hierarchy. Slides contents from:

Lecture 2: Caches. belongs to Milo Martin, Amir Roth, David Wood, James Smith, Mikko Lipasti

Chapter 2: Memory Hierarchy Design Part 2

Cache Memory: Instruction Cache, HW/SW Interaction. Admin

Fall 2011 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic

Best-Offset Hardware Prefetching

Chapter 2: Memory Hierarchy Design Part 2

Caches Concepts Review

Caching Basics. Memory Hierarchies

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Lecture notes for CS Chapter 2, part 1 10/23/18

ECE ECE4680

VIRTUAL MEMORY READING: CHAPTER 9

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains:

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James

CS 240 Stage 3 Abstractions for Practical Systems

Mo Money, No Problems: Caches #2...

CACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches

What is Cache Memory? EE 352 Unit 11. Motivation for Cache Memory. Memory Hierarchy. Cache Definitions Cache Address Mapping Cache Performance

Chapter 5. Memory Technology

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Pollard s Attempt to Explain Cache Memory

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Memory Hierarchy: Caches, Virtual Memory

CS7810 Prefetching. Seth Pugsley

Introduction. Memory Hierarchy

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Logical Diagram of a Set-associative Cache Accessing a Cache

EITF20: Computer Architecture Part4.1.1: Cache - 2

Chapter 2: Memory Hierarchy Design, part 1 - Introducation. Advanced Computer Architecture Mehran Rezaei

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

CSC 631: High-Performance Computer Architecture

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Transcription:

Memory Hierarchy Bojian Zheng CSCD70 Spring 2018 bojian@cs.toronto.edu 1

Memory Hierarchy From programmer s point of view, memory has infinite capacity (i.e. can store infinite amount of data) has zero access time (latency) But those two requirements contradict with each other. Large memory usually has high access latency. Fast memory cannot go beyond certain capacity limit. Therefore, we want to have multiple levels of storage and ensure most of the data the processor needs is kept in the fastest level. 2

Memory Hierarchy What in reality What we see (Ideally) 3

Locality Temporal Locality: If an address gets accessed, then it is very likely that the exact same address will be accessed once again in the near future. unsigned a = 10; // a will hopefully be used again soon 4

Locality Spatial Locality: If an address gets accessed, then it is very likely that nearby addresses will be accessed in the near future. unsigned A[10]; A[5] = 10; // A[4] or A[6] will hopefully be used soon 5

Cache Basics Memory is divided into fixed size blocks. Each block maps to one location in the cache (called cache block or cache line), determined by the index bites. 6

Direct-Mapped Cache 7

Direct-Mapped Cache: Problem Suppose that two variables A (address 0 b10000000) and B (address 0 b11000000) map to the same cache line. Interleaving accesses to them will lead to 0 hit rate (i.e. A->B->A->B-> ). Those are called conflict misses (more later). 8

Set-Associative Cache 9

Set-Associative Cache: Problem More expansive tag comparison. More complicated design lots of things to consider: Where should we insert the new incoming cache block? What happens when a cache hit occurs? How should we adjust the priorities? What happens when a cache miss occurs, and the cache set has been fully occupied (Replacement Policy)? 10

Replacement Policy Which one to evict under the condition that the cache set is full? Least-Recently-Used? Random? Pseudo-Least-Recently-Used? Belady s (a.k.a. Optimal) Replacement Policy Evict block that will be reused furthest in the future. Impossible to implement in hardware why? But is still useful why? 11

Cache Misses There are 3 types of cache misses: Cold Misses: happens whenever we reference one variable (memory location) for the first time. Such misses are unavoidable (???). Capacity Misses: happens because our cache is too small. Misses that happen even under fully-associative cache and optimal replacement policy. Cache size is smaller than working set size. Conflict Misses: happens because we do not have enough associativity. 12

Handling Writes Write-Back vs. Write-Through Write-Back: Write modified data to memory when the cache block is evicted. (+) Can effectively combine multiple writes to the same block. (-) Needs an extra bit indicating dirty or clean. Write-Through: Write modified data to memory whenever write occurs. (+) Simple and makes it easy when arguing about consistency. (-) Cannot combine writes and is more bandwidth intensive. 13

Prefetching Upon one cache miss, try to predict what the next cache miss will be. For instance, miss on A[0] prefetch A[1] into the cache. Good for memory access patterns that are highly predictable, e.g. Instruction Memory Array Accesses (Uniform Stride) Risk: Cache Pollution Goal: Timeliness, Coverage, Accuracy 14

Questions? Cache Basics Cache Locality Set-Associativity Replacement Policy Cache Misses Cold Capacity Conflict Handling Writes Write-Back Write-Through Prefetching Risk Goal 15

Preview: Cache & Compiler Ok So how this is related to compiler? unsigned A[20][10]; for (unsigned i = 0; i < 10; ++i) for (unsigned j = 0; j < 20; ++j) A[j][i] = j * 10 + i; // Assume A is row-major. This is not very nice because cache has been utilized badly : ( 16

Preview: Cache & Compiler for (i 0, 10 ) for (j 0, 20 ) A[j][i] = j * 10 + i; for (j 0, 20 ) for (i 0, 10 ) A[j][i] = j * 10 + i; 17

Preview: Cache & Compiler Consider another example: unsigned A[100][100]; for (unsigned i = 0; i < 100; ++i) for (unsigned j = 0; j < 100; ++j) sum += A[i][j] 18

Preview: Cache & Compiler Apply prefetching: unsigned A[100][100]; for (unsigned i = 0; i < 100; ++i) for (unsigned j = 0; j < 100; j += $_BLOCK_SIZE) for (unsigned jj = j; jj < j + $_BLOCK_SIZE; ++jj) prefetch(&a[i][j] + $_BLOCK_SIZE) sum += A[i][jj] 19

Preview: Design Tradeoff Consider the following code: unsigned A[20][10]; for (unsigned i = 0; i < 10; ++i) for (unsigned j = 0; j < 19; ++j) A[j][i] = A[j+1][i]; 20

Preview: Design Tradeoff Loop Parallelization for (i 0, 10 ) for (j 0, 19 ) A[j][i] = A[j+1][i]; Cache Locality for (j 0, 19 ) for (i 0, 10 ) A[j][i] = A[j+1][i]; 21