Why memory hierarchy

Similar documents
CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Memory Hierarchy Design (Appendix B and Chapter 2)

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

COSC 6385 Computer Architecture - Memory Hierarchies (I)

CPE 631 Lecture 04: CPU Caches

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Logical Diagram of a Set-associative Cache Accessing a Cache

Introduction. Memory Hierarchy

EE 4683/5683: COMPUTER ARCHITECTURE

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture Memory hierarchies and caches

Course Administration

Introduction to cache memories

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Advanced Memory Organizations

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

CS3350B Computer Architecture

CS3350B Computer Architecture

EEC 483 Computer Organization

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

COSC3330 Computer Architecture Lecture 19. Cache

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

ECE331: Hardware Organization and Design

Memory Hierarchy: Motivation

Caching Basics. Memory Hierarchies

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

ECE ECE4680

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Page 1. Memory Hierarchies (Part 2)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Page 1. Multilevel Memories (Improving performance using a little cash )

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

CMPSC 311- Introduction to Systems Programming Module: Caching

LSN 7 Cache Memory. ECT466 Computer Architecture. Department of Engineering Technology

Chapter 6 Objectives

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Systems Programming and Computer Architecture ( ) Timothy Roscoe

V. Primary & Secondary Memory!

CSE 2021: Computer Organization

ECEC 355: Cache Design

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Improving Cache Performance

The Memory Hierarchy & Cache

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Introduction to OpenMP. Lecture 10: Caches

CSE 2021: Computer Organization

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Memory Hierarchy: Caches, Virtual Memory

EN1640: Design of Computing Systems Topic 06: Memory System

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Memory System Design Part II. Bharadwaj Amrutur ECE Dept. IISc Bangalore.

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

ECE331: Hardware Organization and Design

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Memory hierarchy and cache

COMP 3221: Microprocessors and Embedded Systems

EEC 483 Computer Organization

Lecture 12: Memory hierarchy & caches

ECE232: Hardware Organization and Design

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Chapter 5. Memory Technology

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

The check bits are in bit numbers 8, 4, 2, and 1.

ECE331: Hardware Organization and Design

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches, Set Associative Caches, Cache Performance

EN1640: Design of Computing Systems Topic 06: Memory System

Cray XE6 Performance Workshop

CMPSC 311- Introduction to Systems Programming Module: Caching

Memory Hierarchy: The motivation

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

Improving Cache Performance

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Lec 11 How to improve cache performance

Transcription:

Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast memory fast memory expensive, slow memory cheap cache: small, fast memory near CPU large, slow memory (main memory, disk, ) connected to faster memory (one level up) dt10 2011 6.1

Core-2 Duo Extreme dt10 2011 6.2

Locality principles temporal locality: items recently used by CPU tend to be referenced again soon guides cache replacement policy: what to replace when cache is full spatial locality: items with addresses close to recently-used items tend to be referenced fetches multiple data dt10 2011 6.3

Cache operation cache organised in blocks of one or more words cache hit: CPU finds required block in cache otherwise, cache miss: get data from main memory hit rate: ratio of cache access to total memory access hit time: time to access cache + time to determine cache hit or miss miss penalty: time to replace item in cache + time to transfer it to CPU dt10 2011 6.4

Direct mapped cache each memory location mapped to one cache location e.g. cache index = (memory block address) (cache address) mod (number of blocks in cache) multiple memory locations -> one cache location e.g. 8 blocks in cache, cache location 001 may contain items from memory locations 00001, 01001, 10001 dt10 2011 6.5

Direct mapped cache mapping address is modulo the number of blocks in the cache Cache 000 001 010 011 100 101 110 111 00001 00101 01001 01101 10001 10101 11001 11101 Memory dt10 2011 6.6

Address translation lower portion: cache index upper portion: compare with tag Hit if: tag matches and block is valid What about miss? dt10 2011 6.7

Handling misses: by exception cache miss on instruction read: restore PC: PC = PC - 4 send address to main memory and wait (stall) write data received from memory to cache refetch instruction (from restored PC) cache miss on data read: similar: stall CPU until data from main memory are available in cache which is probably more common? dt10 2011 6.8

Cache write policy need to maintain cache consistency how to keep data in cache and in memory consistent? write back: write to cache only complex control, need dirty bit for cache entries have to flush dirty entries when they are evicted write through: write to both cache and memory memory write bandwidth can cause bottleneck What happens with shared memory multiprocessors? dt10 2011 6.9

Write buffer for write through insert buffer between cache and memory processor: write data into cache and write buffer memory controller: write contents of the write buffer to memory write buffer is just a FIFO queue fine if store frequency (w.r.t. time) «otherwise have write buffer saturation 1 memory write cycle dt10 2011 6.10

Write buffer saturation store buffer overflows when CPU cycle time too fast with respect to memory access time too many store instructions in a row solutions to write buffer saturation use a write back cache install a second-level (L2) cache store compression dt10 2011 6.11

Exploiting spatial locality dt10 2011 6.12

Block size and performance block size, miss rate generally (especially for instructions) large block for small cache: miss rate - too few blocks block size: transfer time between cache and main memory dt10 2011 6.13

Impact of block size on miss rate 4 0 % 3 5 % 3 0 % 2 5 % miss rate 2 0 % 1 5 % 1 0 % 5 % 0 % 4 block size, miss rate generally 1 6 6 4 Block size (bytes) 2 5 6 large block for small cache: miss rate - too few blocks block size: transfer time between cache and main memory 1 KB 8 KB 1 6 K B 6 4 K B 2 56 KB Total cache size (KBytes) dt10 2011 6.14

Multi-word cache: handling misses cache miss on read: same way as single-word block bring back the entire multi-word block from memory cache miss on write, given write-through cache: single-word block: disregard hit or miss, just write to cache and write buffer / memory do the same for multi-word block? dt10 2011 6.15

Cache performance CPU time = (execution cycles + memory stall cycles) cycle time memory stall cycles = read stall cycles + write stall cycles read stall cycles = write stall cycles = reads read miss read miss program rate (%) penalty (cycle) read miss per program number of cycles per read miss writes write miss write miss program rate (%) penalty (cycle) + write buffer stalls (when full) dt10 2011 6.16

Effect on CPU assume hit time insignificant (data transfer time dominated) let c: CPI-no stall, i: instruction miss rate p: miss penalty, d: data miss rate n: instruction count, f: load/store frequency total memory stall cycles: nip + nfdp total CPU cycles without stall: nc total CPU cycles with stall: n(c + ip + fdp) % time on stall: ip + fdp c + ip + fdp dt10 2011 6.17

Faster CPU same memory speed, halved CPI % time on stall = ip + fdp ip + fdp (½)c + ip + fdp > c + ip + fdp lower CPI results in greater impact of stall cycles same memory speed, halved clock cycle time t miss penalty: 2p total CPU time with new clock: n(c + 2ip + 2fdp) performance improvement = = = exec. time with old clock exec. time with new clock n(c + ip + fdp) t n(c + 2ip + 2fdp) t/2 c + ip + fdp c + 2ip + 2fdp 2 < 2 dt10 2011 6.18

Multi-level cache hierarchy how can we look at the cache hierarchy? performance view capacity view physical hierarchy abstract hierarchy dt10 2011 6.19

Typical scale L1 size: tens of KB hit time: complete in one clock cycle miss rates: 1-5% L2: size: hundreds of KB hit time: a few clock cycles miss rates: 10-20% L2 miss rate: fraction of L1 misses also miss L2 why so high? complex: different block size/placement for L1, L2 dt10 2011 6.20

Average Memory Access Time want the Average Memory Access Time (AMAT) take into account all levels of the hierarchy calculate MT cpu : AMAT for ISA-level accesses follow the abstract hierarchy AMAT CPU = AMAT L1 AMAT L1 = HitTime L1 + MissRate L1 * AMAT L2 AMAT L2 = HitTime L2 + MissRate L2 * AMAT M AMAT M = constant AMAT CPU =HitTm L1 + MissRt L1 (HitTm L2 +MissRt L2 AMAT M ) dt10 2011 6.21

Example assume L1 hit time = 1 cycle L1 miss rate = 5% L2 hit time = 5 cycles L2 miss rate = 15% (% L1 misses that miss) L2 miss penalty = 200 cycles L1 miss penalty = 5 + 0.15 x 200 = 35 cycles AMAT = 1 + 0.05 x 35 = 2.75 cycles dt10 2011 6.22

Example: without L2 cache assume L1 hit time = 1 cycle L1 miss rate = 5% L1 miss penalty = 200 cycles AMAT = 1 + 0.05 x 200 = 11 cycles 4 times faster with L2 cache! (2.75 versus 11) dt10 2011 6.23