Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Similar documents
Computer Systems Laboratory Sungkyunkwan University

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Chapter 5. Memory Technology

Computer Systems Laboratory Sungkyunkwan University

CSE 2021: Computer Organization

CSE 2021: Computer Organization

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

The Memory Hierarchy Cache, Main Memory, and Virtual Memory

V. Primary & Secondary Memory!

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Main Memory Supporting Caches

EN1640: Design of Computing Systems Topic 06: Memory System

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Advanced Memory Organizations

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Jiang Jiang

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

EN1640: Design of Computing Systems Topic 06: Memory System

Memory Hierarchy Y. K. Malaiya

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by:

ECE232: Hardware Organization and Design

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Lecture 12: Memory hierarchy & caches

Mainstream Computer System Components

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

COMPUTER ORGANIZATION AND DESIGN

14:332:331. Week 13 Basics of Cache

COMPUTER ORGANIZATION AND DESIGN ARM

Computer System Components

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Lecture 11: Memory Systems -- Cache Organiza9on and Performance CSE 564 Computer Architecture Summer 2017

A Cache Hierarchy in a Computer System

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

COSC 6385 Computer Architecture - Memory Hierarchies (III)

Recap: Machine Organization

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Recall

The Memory Hierarchy & Cache

ECE331: Hardware Organization and Design

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

ECE 30 Introduction to Computer Engineering

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Lecture 18: DRAM Technologies

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

Large and Fast: Exploiting Memory Hierarchy

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Caches. Hiding Memory Access Times

Memory. Objectives. Introduction. 6.2 Types of Memory

Mainstream Computer System Components

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE232: Hardware Organization and Design

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

14:332:331. Week 13 Basics of Cache

The University of Adelaide, School of Computer Science 13 September 2018

Memory. Lecture 22 CS301

More and faster A.W. Burks, H.H. Goldstine A.Von Neumann

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

EE414 Embedded Systems Ch 5. Memory Part 2/2

ECE 485/585 Microprocessor System Design

Memory Hierarchy: Caches, Virtual Memory

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

LECTURE 11. Memory Hierarchy

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Copyright 2012, Elsevier Inc. All rights reserved.

Adapted from David Patterson s slides on graduate computer architecture

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Transcription:

Caches Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu

Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Access time of SRAM Capacity and cost/gb of disk 2

DRAM Organization (1) Micron MT4LC16M4T8 (16M x 4bit) 3

DRAM Organization (2) DRAM configuration Asynchronous: no clock Large capacity: 1 4Gb Arranged as 2D matrix Minimizes wire length Maximizes refresh efficiency Narrow data interface: 1 16 bits (x1, x4, x8, x16) Cheap packages few bus pins Pins are expensive Narrow address interface: Multiplexed address lines: row and column address Signaled by RAS# and CAS# respectively 4

DRAM Operation (1) Read operation (1) (1) word line 5

DRAM Operation (2) Read operation (2) (2) word line bit line 6

DRAM Operation (3) Read operation (3) (3) word line bit line 7

DRAM Operation (4) Read cycle 8

DRAM Operation (5) Timing parameters t RC : minimum time from the start of one row access to the start of the next (cycle time) t RAC : minimum time from RAS# line falling to the valid data output (access time) Used to be quoted as the nominal speed of a DRAM chip t CAC : minimum time from CAS# line falling to valid data output Model t RC t RAC t CAC MT4LC16M4T8-5 90 ns 50 ns 13 ns MT4LC16M4T8-6 110 ns 60 ns 15 ns 9

DRAM Generations Year Capacity $/GB 1980 64Kbit $1500000 1983 256Kbit $500000 1985 1Mbit $200000 1989 4Mbit $50000 1992 16Mbit $15000 1996 64Mbit $10000 1998 128Mbit $4000 2000 256Mbit $1000 2004 512Mbit $250 2007 1Gbit $50 300 250 200 150 100 50 0 '80 '83 '85 '89 '92 '96 '98 '00 '04 '07 Trac Tcac 10

Principle of Locality Programs access a small portion of their address space at any time Temporal locality Items accessed recently are likely to be accessed again soon e.g., instructions in a loop, induction variables Spatial locality Items near those accessed recently are likely to be accessed soon e.g., sequential instruction access, array data 11

Memory Hierarchy (1) We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible. -- A. W. Burks, H. H. Goldstein, J. von Neumann, Preliminary Discussion of the Logical Design of Electronic Computing Instrument, June 1946. Taking advantage of locality Store everything on disk Copy recently accessed (and nearby) items from disk to smaller DRAM memory (main memory) Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory (cache memory) 12

Memory Hierarchy (2) Smaller, Faster, and Costlier (per byte) L1: L0: registers on-chip L1 cache (SRAM) CPU registers hold words retrieved from L1 cache. L1 cache holds cache lines retrieved from the L2 cache memory. L2: on-chip L2 cache (SRAM) L2 cache holds cache lines retrieved from main memory. Larger, Slower, and Cheaper (per byte) L4: L3: main memory (DRAM) local secondary storage (local magnetic disks) Main memory holds disk blocks retrieved from local disks. Local disks hold files retrieved from disks on remote network servers. L5: remote secondary storage (distributed file systems, Web servers) 13

Memory Hierarchy (3) Terminologies Block (aka line): unit of copying May be multiple words Hit: If accessed data is present in upper level, access satisfied by upper level Hit ratio: hits/accesses Miss: If accessed data is absent, block copied from lower level Time taken: miss penalty Miss ratio: misses/accesses = 1 - hit ratio 14

Caches (1) Cache memory The level of the memory hierarchy closest to the CPU Given accesses X 1,, X n 1, X n How to we know if the data is present? Where do we look? 15

Caches (2) Direct mapped cache Location determined by address Direct mapped: only one choice (Block address) modulo (#Blocks in cache) #Blocks is a power of 2 Use low-order address bits 16

Caches (3) Tags and valid bits How do we know which particular block is stored in a cache location? Store block address as well as the data Actually, only need the high-order bits Called the tag Address Tag Index Offset What if there is no data in a location? Valid bit: 1 = present, 0 = not present Initially 0 17

Cache Example (1) 8-blocks, 1 word/block, direct mapped Initial state Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N 18

Cache Example (2) Word addr Binary addr Hit/miss Cache block 22 10 110 Miss 110 Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N 19

Cache Example (3) Word addr Binary addr Hit/miss Cache block 26 11 010 Miss 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N 20

Cache Example (4) Word addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N 21

Cache Example (5) Word addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N 22

Cache Example (6) Word addr Binary addr Hit/miss Cache block 18 10 010 Miss 010 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N 23

Address Subdivision 24

Block Size (1) Example: larger block size 64 blocks, 16 bytes/block To what block number does address 1200 map? Block address = 1200/16 = 75 Block number = 75 modulo 64 = 11 31 10 9 4 3 0 Tag Index Offset 22 bits 6 bits 4 bits 25

Block Size (2) Considerations Larger blocks should reduce miss rate Due to spatial locality SPEC92 26

Block Size (3) Considerations (cont d) But in a fixed-sized cache Larger blocks fewer of them» More competition increased miss rate Larger blocks pollution Larger miss penalty Can override benefit of reduced miss rate Early restart and critical-word-first can help 27

Main Memory & Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width clocked bus Bus clock is typically slower than CPU clock Example cache block read 1 bus cycle for address transfer 15 bus cycles per DRAM access 1 bus cycle per data transfer For 4-word block, 1-word-wide DRAM Miss penalty = 1 + 4 x 15 + 4 x 1 = 65 bus cycles Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle 28

Increasing Memory Bandwidth 4-word wide memory Miss penalty = 1 + 15 + 1 = 17 bus cycles Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle 4-bank interleaved memory Miss penalty = 1 + 15 + 4 x 1 = 20 bus cycles Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle 29

Advanced DRAM Organization Bits in a DRAM are organized as a rectangular array DRAM accesses an entire row Burst mode: supply successive words from a row with reduced latency Synchronous DRAM (SDRAM) Add a clock signal to DRAM interfaces Double data rate (DDR) DRAM Transfer on rising and falling clock edges 30