Memory Hierarchy. Slides contents from:

Similar documents
Memory Hierarchy. Slides contents from:

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

Page 1. Multilevel Memories (Improving performance using a little cash )

Advanced Computer Architecture

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

The University of Adelaide, School of Computer Science 13 September 2018

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Lecture 7 - Memory Hierarchy-II

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 11 Cache. Peng Liu.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Memory Hierarchy Design (Appendix B and Chapter 2)

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Advanced Memory Organizations

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

EE 4683/5683: COMPUTER ARCHITECTURE

Course Administration

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

LECTURE 11. Memory Hierarchy

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

LECTURE 5: MEMORY HIERARCHY DESIGN

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Topics. Digital Systems Architecture EECE EECE Need More Cache?

CS3350B Computer Architecture

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

EN1640: Design of Computing Systems Topic 06: Memory System

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Page 1. Memory Hierarchies (Part 2)

Logical Diagram of a Set-associative Cache Accessing a Cache

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Adapted from David Patterson s slides on graduate computer architecture

Introduction. Memory Hierarchy

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

EN1640: Design of Computing Systems Topic 06: Memory System

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Lecture-14 (Memory Hierarchy) CS422-Spring

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

CMPSC 311- Introduction to Systems Programming Module: Caching

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Introduction to OpenMP. Lecture 10: Caches

V. Primary & Secondary Memory!

COMPUTER ORGANIZATION AND DESIGN

Cray XE6 Performance Workshop

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

14:332:331. Week 13 Basics of Cache

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Copyright 2012, Elsevier Inc. All rights reserved.

CSE502: Computer Architecture CSE 502: Computer Architecture

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Computer Architecture Memory hierarchies and caches

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Virtual Memory, Address Translation

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

6 th Lecture :: The Cache - Part Three

A Cache Hierarchy in a Computer System

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

TDT 4260 lecture 3 spring semester 2015

Caching Basics. Memory Hierarchies

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

The memory gap. 1980: no cache in µproc; level cache on Alpha µproc

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Transcription:

Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL

Memory Performance Gap Memory Wall Wall

Memory Performance Gap Aggregate Peak Bandwidth required by Intel i7 2 references per clock 4 cores, 32GHz 256 Billion 64-bit references/sec + 128 Billion128- bit instruction references = 4096 GB/s DRAM Capacity = 256 GB/s Multiport, Pipelined caches Two levels of cache per core Shared third-level cache on chip

Introduction Programmers want unlimited amounts of memory with low latency Fast memory is more expensive per bit than slower memory Solution: organize memory system into a hierarchy Entire addressable memory space available in largest, slowest memory Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor Temporal and spatial locality ensures high hit rate in smaller memories

Predictable Memory Reference Patterns Spatial and Temporal Locality Hatfield and Gerald: Program Restructuring for Virtual Memory IBM Systems Journal 10(3): 168-192 (1971)

Locality of Reference Temporal locality (locality in time): if an item is referenced, it will tend to be referenced again soon Spatial locality (locality in space): if an item is referenced, items whose addresses are close by will tend to be referenced soon

The Memory Hierarchy

Memory Technology Trade-offs Latches/Registers Register File Low Capacity Low Latency High Bandwidth (more and wider ports) SRAM DRAM High Capacity High Latency Low Bandwidth

SRAM Cell wordline b b

Cache Hardware structure that provides memory objects that the processor references MAIN MEMORY PROCESSOR CACHE

Cache Organization 32KB 32KB CACHE 0 1 2 Data (32B) 0 1 30 31 Line (Block) 1022 1023

Cache Organization 32 bit address from Processor 0 1 2 Data (32B) 1022 1023

Cache Organization Index 10b 0 1 2 Data (32B) 1022 1023

Cache Organization Index 10b Block Offset 5b 0 1 2 Data (32B) 1022 1023

Cache Organization Tag 17b Index 10b Block Offset 5b 0 1 2 Data (32B) 1022 1023

Direct Mapped Cache Organization Tag Index BO 17 10 5 Index Valid Tag Data 0 1 2 1022 1023

Direct Mapped Cache Organization Tag Index BO 17 10 5 Hit Index Valid Tag Data (32B) 0 1 2 Data to Processor 1022 1023 =

Direct Mapped Cache Organization Tag Index BO 17 10 5 Index Valid Tag 0 1 2 Data (32B) Hit 1022 1023 = = Cache Miss Send address to Lower level

Block Placement Tag Index BO 0 1 2 V Tag Data Direct Mapped Cache Index bits bits Identify a unique Cache line line 1022 1023

Block Placement 2 Index CacheSize Index = BlockSize SetAssociativity Tag Index BO Set 0 1 V Tag Data 0 1 2 Set Set Associative Cache Index bits bits Identify a unique SET SET A Set Set contains multiple cache lines lines 511 1022 1023 2-way Set Set Associative Cache

2 way Set Associative Cache Valid Tag Data Valid Tag Data = = Reduces conflict misses Hit More More energy spent per per data data access

Block Placement 4Qs of Caches Where can a block be placed in a cache? Direct mapped, Set associative, Fully associative Block Identification How is a block found if it is in cache? Block Replacement Which block should be replaced on a miss? Write Strategy What happens on a write?

Block Replacement No choice in a direct mapped cache Random Least Recently Used (LRU) LRU cache state must be updated on every access True implementation only feasible for small sets (2-way) Psuedo LRU First In, First Out (FIFO) aka Round-Robin Used in highly associative caches Not Most Recently Used (NMRU) FIFO with exception for most recently used block(s)

Write Strategy: How are writes handled? Cache Hit Write Through write both cache and memory, generally higher traffic but simpler to design Write Back write cache only, memory is written when evicted, dirty bit per block avoids unnecessary write backs, more complicated Cache Miss No Write Allocate only write to main memory Write Allocate fetch block into cache, then write Common Combinations Write Through & No Write Allocate Write Back & Write Allocate

Average Memory Access Time PROCESSOR HIT CACHE MAIN MEMORY MISS Avg Memory Access Time =Hit Time +( Miss Rate Miss Penalty)

Categorizing Misses: The Three C's Cold Start Misses (Compulsory) First-reference to a block Capacity misses Cache is too small to hold all data needed by program, occur even under perfect replacement policy Conflict (Collision) Misses Misses that occur because of collisions due to less than full associativity

Basic Cache Optimizations Larger block size to reduce miss rate More data items per block Reduces compulsory misses Increases traffic May increase conflict misses Larger caches to reduce miss rate Reduces capacity (and conflict misses) Longer access time

Basic Cache Optimizations Higher associativity to reduce miss rate Lesser conflict misses 2 way SA cache of size N has same miss ratio of a DM cache of size 2N Multilevel caches to reduce miss penalty Prioritize read misses over writes Write buffer

Reducing Cache Misses Cache compression Victim Cache DM caches have large conflict misses Saves evicted lines in a victim buffer L1 L1 (DM) (DM) V (FA)

Multicore Caches Private L2 L2 divided into banks Lower access time Data duplication, Cache coherance Static space allocation Shared L2 Longer access time Long bus delay Contention between cores Higher hit rate Dynamic space allocation P1 P2 P3 P4 P1 P2 P3 P4 L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 Private L2 L2 Shared L2 L2

UCA and NUCA Uniform Cache Access Non uniform cache access

Shared NUCA Cache L2 is distributed throughout the chip OS can smart distribute data required by a core in the bank closest to the core