ECE232: Hardware Organization and Design

Similar documents
ECE331: Hardware Organization and Design

CSE 2021: Computer Organization

CSE 2021: Computer Organization

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

ECE 331 Hardware Organization and Design. UMass ECE Discussion 10 4/5/2018

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design

Chapter 5. Memory Technology

ECE331: Hardware Organization and Design

Computer Systems Laboratory Sungkyunkwan University

ECE260: Fundamentals of Computer Engineering

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

V. Primary & Secondary Memory!

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Advanced Memory Organizations

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Page 1. Memory Hierarchies (Part 2)

CS3350B Computer Architecture

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

ECE232: Hardware Organization and Design

Main Memory Supporting Caches

Memory. Lecture 22 CS301

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

EE 4683/5683: COMPUTER ARCHITECTURE

Computer Architecture CS372 Exam 3

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache

COSC3330 Computer Architecture Lecture 19. Cache

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Memory Hierarchy Design (Appendix B and Chapter 2)

EN1640: Design of Computing Systems Topic 06: Memory System

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Improving Cache Performance

Improving Cache Performance

Course Administration

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

ECE 30 Introduction to Computer Engineering

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Cache Memory and Performance

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CS429: Computer Organization and Architecture

CS61C : Machine Structures

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

UCB CS61C : Machine Structures

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Modern Computer Architecture

COMPUTER ORGANIZATION AND DESIGN

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Review : Pipelining. Memory Hierarchy

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Lecture 11: Memory Systems -- Cache Organiza9on and Performance CSE 564 Computer Architecture Summer 2017

CS61C : Machine Structures

LECTURE 11. Memory Hierarchy

Lecture 11 Cache. Peng Liu.

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Solutions for Chapter 7 Exercises

EN1640: Design of Computing Systems Topic 06: Memory System

CPE 631 Lecture 04: CPU Caches

Advanced Computer Architecture

Computer Architecture Spring 2016

Review: Computer Organization

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses

Block Size Tradeoff (1/3) Benefits of Larger Block Size. Lecture #22 Caches II Block Size Tradeoff (3/3) Block Size Tradeoff (2/3)

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Caches. Hiding Memory Access Times

ארכיטקטורת יחידת עיבוד מרכזי ת

UCB CS61C : Machine Structures

Transcription:

ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

Overview Caches hold a subset of data from the main memory Three types of caches Direct mapped Set associative Fully associative Today: Direct mapped Each memory value can only be in one place in the cache Is it there (Hit?) Or is it not there (Miss?) ECE232: Introduction to Caches 2

Direct Mapped Cache - Textbook Location determined by address Direct mapped: only one choice (Block address) modulo (#Blocks in cache) #Blocks is a power of 2 Use low-order address bits ECE232: Introduction to Caches 3

Direct mapped cache (assume 1 byte/block) Cache Block 0 can be occupied by data from Memory blocks 0, 4, 8, 12 Cache Block 1 can be occupied by data from Memory blocks 1, 5, 9, 13 Cache Block 2 can be occupied by data from Memory blocks 2, 6, 10, 14 Cache Block 3 can be occupied by data from Memory blocks 3, 7, 11, 15 0000 2 0100 2 1000 2 1100 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block Index Memory 0 1 2 3 Cache Index 4-Block Direct Mapped Cache ECE232: Introduction to Caches 4

Direct Mapped Cache Index and Tag Memory 1 byte 00 00 2 01 00 2 10 00 2 11 00 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block Index 0 1 2 3 Cache Index Memory block address tag index index determines block in cache index = (address) mod (# blocks) The number of cache blocks is power of 2 cache index is the lower n bits of memory address ECE232: Introduction to Caches 5

Direct Mapped w/tag 00 10 01 10 10 10 11 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block Index Memory 0 1 2 3 Cache Index tag tag tag determines which memory block occupies cache block hit: cache tag field = tag bits of address 11 Memory block address index miss: tag field tag bits of address ECE232: Introduction to Caches 6

Direct Mapped Cache Simplest mapping is a direct mapped cache Each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache ECE232: Introduction to Caches 7

Finding Item within Block In reality, a cache block consists of a number of bytes/words to (1) increase cache hit due to locality property and (2) reduce the cache miss time Given an address of item, index tells which block of cache to look in Then, how to find requested item within the cache block? Or, equivalently, What is the byte offset of the item within the cache block? ECE232: Introduction to Caches 8

Selecting part of a block (block size > 1 byte) If block size > 1, rightmost bits of index are really the offset within the indexed block TAG INDEX OFFSET Tag to check if have correct block Index to select a block in cache Byte offset Example: Block size of 8 bytes; select byte 4 (or 2 nd word) Memory address 11 01 100 0 1 2 3 tag 11 ECE232: Introduction to Caches 9 Cache Index

Accessing data in a direct mapped cache Three types of events: cache hit: cache block is valid and contains proper address, so read desired word cache miss: nothing in cache in appropriate block, so fetch from memory cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory Cache Access Procedure: (1) Use Index bits to select cache block (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits (3) If they match, use the offset to read out the word/byte ECE232: Introduction to Caches 10

Tags and Valid Bits How do we know which particular block is stored in a cache location? Store block address as well as the data Actually, only need the high-order bits Called the tag What if there is no data in a location? Valid bit: 1 = present, 0 = not present Initially 0 ECE232: Introduction to Caches 11

Cache Example 8-blocks, 1 byte/block, direct mapped Initial state Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N ECE232: Introduction to Caches 12

Cache Example Addr Binary addr Hit/mis s Cache block 22 10 110 Miss 110 Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 13

Cache Example Addr Binary addr Hit/miss Cache block 26 11 010 Miss 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 14

Cache Example Addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 15

Cache Example Addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 16

Cache Example Addr Binary addr Hit/miss Cache block 18 10 010 Miss 010 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 17

Example: Larger Block Size 64 blocks, 16 bytes/block To what block number does address 1200 map? Block address = 1200/16 = 75 Block number = 75 modulo 64 = 11 31 10 9 4 3 0 Tag Index Offset 22 bits 6 bits 4 bits ECE232: Introduction to Caches 18

Block Size Considerations Larger blocks should reduce miss rate Due to spatial locality But in a fixed-sized cache Larger blocks fewer of them More competition increased miss rate Larger blocks pollution Larger miss penalty Can override benefit of reduced miss rate Early restart and critical-word-first can help ECE232: Introduction to Caches 19

Cache Misses On cache hit, CPU proceeds normally On cache miss Stall the CPU pipeline Fetch block from next level of hierarchy Instruction cache miss Restart instruction fetch Data cache miss Complete data access ECE232: Introduction to Caches 20

Write-Through On data-write hit, could just update the block in cache But then cache and memory would be inconsistent Write through: also update memory But makes writes take longer e.g., if base CPI = 1, 10% of instructions are stores, write to memory takes 100 cycles Effective CPI = 1 + 0.1 100 = 11 Solution: write buffer Holds data waiting to be written to memory CPU continues immediately Only stalls on write if write buffer is already full ECE232: Introduction to Caches 21

Write-Back Alternative: On data-write hit, just update the block in cache Keep track of whether each block is dirty When a dirty block is replaced Write it back to memory Can use a write buffer to allow replacing block to be read first ECE232: Introduction to Caches 22

Measuring Cache Performance Components of CPU time Program execution cycles Includes cache hit time Memory stall cycles Mainly from cache misses With simplifying assumptions: Memory stallcycles Memory accesses Miss rate Miss penalty Program Instructions Program Misses Instruction Miss penalty ECE232: Introduction to Caches 23

Average Access Time Hit time is also important for performance Average memory access time (AMAT) AMAT = Hit time + Miss rate Miss penalty Example CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5% AMAT = 1 + 0.05 20 = 2ns 2 cycles per instruction ECE232: Introduction to Caches 24

Summary Today: Direct mapped cache Performance: tied to whether values are located in the cache Cache miss = bad performance Need to understand how to numerically determine system performance based on cache hit rate Why might direct mapped caches be bad Lots of data map to same location in cache Idea Maybe we should have multiple locations for each data value Next time: set associative ECE232: Introduction to Caches 25