EE 457 Unit 7a. Cache and Memory Hierarchy

Similar documents
CS161 Design and Architecture of Computer Systems. Cache $$$$$

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

What is Cache Memory? EE 352 Unit 11. Motivation for Cache Memory. Memory Hierarchy. Cache Definitions Cache Address Mapping Cache Performance

SISTEMI EMBEDDED. Computer Organization Memory Hierarchy, Cache Memory. Federico Baronti Last version:

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

Memory Hierarchy. Mehran Rezaei

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

ECE260: Fundamentals of Computer Engineering

MIPS) ( MUX

EE 457 Unit 7b. Main Memory Organization

Welcome to Part 3: Memory Systems and I/O

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

LECTURE 11. Memory Hierarchy

Page 1. Multilevel Memories (Improving performance using a little cash )

Computer Architecture Memory hierarchies and caches

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy: Caches, Virtual Memory

Caches. Hiding Memory Access Times

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Memory Hierarchies &

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

EN1640: Design of Computing Systems Topic 06: Memory System

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Advanced Memory Organizations


Introduction to OpenMP. Lecture 10: Caches

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory.

Chapter 6 Objectives

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory. Objectives. Introduction. 6.2 Types of Memory

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Assignment 1 due Mon (Feb 4pm

10/11/17. New-School Machine Structures. Review: Single Cycle Instruction Timing. Review: Single-Cycle RISC-V RV32I Datapath. Components of a Computer

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

EE 4683/5683: COMPUTER ARCHITECTURE

CMPSC 311- Introduction to Systems Programming Module: Caching

Cray XE6 Performance Workshop

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

CS61C : Machine Structures

CS 240 Stage 3 Abstractions for Practical Systems

13-1 Memory and Caches

Memory. Lecture 22 CS301

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter

Review : Pipelining. Memory Hierarchy

Levels in memory hierarchy

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Homework 6. BTW, This is your last homework. Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23. CSCI 402: Computer Architectures

ECE ECE4680

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

CS3350B Computer Architecture

ECE331: Hardware Organization and Design

COMP 3221: Microprocessors and Embedded Systems

Computer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Course Administration

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Lecture 2: Memory Systems

CAM Content Addressable Memory. For TAG look-up in a Fully-Associative Cache

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

CS 61C: Great Ideas in Computer Architecture Caches Part 2

CS 31: Intro to Systems Storage and Memory. Kevin Webb Swarthmore College March 17, 2015

ECE468 Computer Organization and Architecture. Memory Hierarchy

CMPT 300 Introduction to Operating Systems

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

CPE300: Digital System Architecture and Design

Caches II CSE 351 Spring #include <yoda.h> int is_try(int do_flag) { return!(do_flag!do_flag); }

CMPSC 311- Introduction to Systems Programming Module: Caching

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Page 1. Memory Hierarchies (Part 2)

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Cache introduction. April 16, Howard Huang 1

7.1. CS356 Unit 8. Memory

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

UCB CS61C : Machine Structures

Transcription:

EE 457 Unit 7a Cache and Memory Hierarchy

2 Memory Hierarchy & Caching Use several levels of faster and faster memory to hide delay of upper levels Registers Unit of Transfer:, Half, or Byte (LW, LH, LB or SW, SH, SB) Smaller More Expensive Faster L Cache ~ ns L2 Cache ~ ns Main Memory ~ ns Secondary Storage ~- ms Unit of Transfer: Cache block/line -8 words (Take advantage of spatial locality) Unit of Transfer: Page 4KB-64KB words (Take advantage of spatial locality) Larger Less Expensive Slower

3 Cache Blocks/Lines Cache is broken into blocks or lines Any time data is brought in, it will bring in the entire block of data Blocks start on addresses multiples of their size Narrow () Cache bus Wide (multi-word) FSB x4 x44 x48 x4c x4 x44 Proc. 28B Cache [4 blocks (lines) of 8-words (32-bytes)] Main Memory

4 Cache Blocks/Lines Whenever the processor generates a read or a write, it will first check the cache memory to see if it contains the desired data If so, it can get the data quickly from cache Otherwise, it must go to the slow main memory to get the data 4 x4 x44 x48 x4c x4 x44 Cache forward desired word 3 Memory responds Proc. Request word @ x428 2 Cache does not have the data and requests whole cache line 42-43f

5 Cache & Virtual Memory Exploits the Principle of Locality Allows us to implement a hierarchy of memories: cache, MM, second storage Temporal Locality: If an item is reference it will tend to be referenced again soon Examples: Loops, repeatedly called subroutines, setting a variable and then reusing it many times Spatial Locality: If an item is referenced items whose addresses are nearby will tend to be referenced soon Examples: Arrays and program code

6 Cache Definitions Cache Hit = Desired data is in cache Cache Miss = Desired data is not present in cache When a cache miss occurs, a new block is brought from MM into cache Load Through: First load the word requested by the CPU and forward it to the CPU, while continuing to bring in the remainder of the block No-Load Through: First load entire block into cache, then forward requested word to CPU On a Write-Miss we may choose to not bring in the MM block since writes exhibit less locality of reference compared to reads When CPU writes to cache, we may use one of two policies: Write Through (Store Through): Every write updates both cache and MM copies to keep them in sync. (i.e. coherent) Write Back: Let the CPU keep writing to cache at fast rate, not updating MM. Only copy the block back to MM when it needs to be replaced or flushed

7 Write Back Cache On write-hit Update only cached copy Processor can continue quickly (e.g. ns) Later when block is evicted, entire block is written back (because bookkeeping is kept on a per block basis) Ex: 8 words @ ns per word for writing mem. = 8 ns x4 x44 x48 x4c x4 x44 Proc. 3 5 Write word (hit) 2 4 Cache updates value & signals processor to continue On eviction, entire block written back

8 Write Through Cache On write-hit Update both cached and main memory version Processor may have to wait for memory to complete (e.g. ns) Later when block is evicted, no writeback is needed Proc. Write word (hit) 2Cache and memory copies are updated 3 On eviction, entire block written back x4 x44 x48 x4c x4 x44

9 Cache Definitions Mapping Function: The correspondence between MM blocks and cache block frames is specified by means of a mapping function Fully Associative Direct Mapping Set Associative Replacement Algorithm: How do we decide which of the current cache blocks is removed to create space for a new block Random Least Recently Used (LRU)

Fully Associative Cache Example Cache Mapping Example: Fully Associative MM = 28 words Cache Size = 32 words Block Size = 8 words Fully Associative mapping allows a MM block to be placed (associate with) in any cache block To determine hit/miss we have to search everywhere = = = = V CPU Address Processor Die Cache - - - - Processor Core Logic data corresponding to address - Main Memory - - - - - - - - - - - - - - - -

Implementation Info s: Associated with each cache block frame, we have a TAG to identify its parent main memory block Valid bit: An additional bit is maintained to indicate that whether the TAG is valid (meaning it contains the TAG of an actual block) Initially when you turn power on the cache is empty and all valid bits are turned to (invalid) Dirty Bit: This bit associated with the TAG indicates when the block was modified (got dirtied) during its stay in the cache and thus needs to written back to MM (used only with the write-back cache policy)

2 Fully Associative Hit Logic Cache Mapping Example: Fully Associative, MM = 28 words (2 7 ), Cache Size = 32 (2 5 ) words, Block Size = (2 3 ) words Number of blocks in MM = 2 7 / 2 3 = 2 4 Block ID = 4 bits Number of Cache Block Frames = 2 5 / 2 3 = 2 2 = 4 Store 4 s of 4-bits + valid bit Need 4 Comparators each of 5 bits CAM (Content Addressable Memory) is a special memory structure to store the tag+valid bits that takes the place of these comparators but is too expensive

3 Fully Associative Does Not Scale If 8386 used Fully Associative Cache Mapping : Fully Associative, MM = 4GB (2 32 ), Cache Size = 64KB (2 6 ), Block Size = (6=2 4 ) bytes = 4 words Number of blocks in MM = 2 32 / 2 4 = 2 28 Block ID = 28 bits Number of Cache Block Frames = 2 6 / 2 4 = 2 2 = 496 Store 496 s of 28-bits + valid bit Need 496 Comparators each of 29 bits Prohibitively Expensive!!

4 Fully Associative Address Scheme A[:] unused => /BE3 /BE bits = log 2 B bits (B=Block Size) = Remaining bits

5 Direct Mapping Cache Example Limit each MM block to one possible location in cache Cache Mapping Example: Direct Mapping MM = 28 words Cache Size = 32 words Block Size = 8 words Each MM block i maps to Cache frame i mod N N = # of cache frames identifies which group that colored block belongs CPU Address Analogy V Cache - - - - Processor Core Logic Grp Processor Die BLK Color BLK Member Group of blocks that each map to different cache blocks but share the same tag Main Memory - - - - - - - - - - - - - - - -

6 Direct Mapping Address Usage Cache Mapping Example: Direct Mapping, MM = 28 words (2 7 ), Cache Size = 32 (2 5 ) words, Block Size = (2 3 ) words Number of blocks in MM = 2 7 / 2 3 = 2 4 Block ID = 4 bits Number of Cache Block Frames = 2 5 / 2 3 = 2 2 = 4 Number of "colors => 2 Number of Block field Bits 2 4 / 2 2 = 2 2 = 4 Groups of blocks 2 Bits CBLK 2 2 3 Block ID=4

7 Direct Mapping Example: Direct Mapping Hit Logic MM = 28 words, Cache Size = 32 words, Block Size = 8 words Block field addresses tag RAM and compares stored tag with tag of desired address CPU Address = Processor Core Logic Hit or Miss CBLK V Cache RAM Addr Addr Cache RAM Main Memory - - - - - - - - - - - - - - - -

8 Direct Mapping Address Usage If 8386 used Direct Cache Mapping : MM = 4GB (2 32 ), Cache Size = 64KB (2 6 ), Block Size = (6=2 4 ) bytes = 4 words Number of blocks in MM = 2 32 / 2 4 = 2 28 Number of Cache Block Frames = 2 6 / 2 4 = 2 2 = 496 Number of "colors => 2 Block field bits 2 28 / 2 2 = 2 6 = 64K Groups of blocks 6 Field Bits CBLK Byte 6 2 Block ID=28 2 2

9 and RAM 8386 Direct Mapped Cache Organization CBLK Cache RAM (4K x 7) Addr CBLK Addr 64KB Cache RAM 6KB Mem 6KB Mem 6KB Mem 6KB Mem Valid = Hit or Miss /BE3 /BE2 /BE /BE 6 2 Block ID=28 2 2 Byte Key Idea: Direct Mapped = Lookup/Comparison to determine a hit/miss

2 Direct Mapping Address Usage Divide MM and Cache into equal size blocks of B words M main memory blocks, N cache blocks Log 2 (B) word field bits A block in caches is often called a cache block/line frame since it can hold many possible MM blocks over time For direct mapping, if you have N cache frames, then define N colors/patterns Log 2 (N) block field bits Repeatedly paint MM blocks with those N colors in roundrobin fashion M/N groups will form Log 2 (M/N) tag field bits

2 Direct Mapping path How many TAG RAM s? Is that answer dependent on address field sizes? How many entries in the TAG RAM? CBLK How many bits wide is each entry in the TAG RAM? How many DATA RAM s? What size is the address field?

22 Single or Parallel RAM s Is it cheaper to have () 2KB RAM (2) KB RAM s Area wise a 2KB RAM occupies less area For tag and data RAMs it would be more economical to use fewer, big RAM s However, consider need for parallel access RAM RAM RAM Only one item at a time Two items accessed in parallel

23 Alternate Direct Mapping Scheme Can you color (i.e. map) the blocks of main memory in a different order? Use high-order bits as BLK field or loworder bits Which is more desirable or does it not really matter? Cache BLK BLK BLK BLK Main Memory Mapping A Main Memory Mapping B BLK BLK Analogy Grp Color Member Analogy Color Grp Member

Set Set Grp 7 Grp 6 Grp 5 Grp 4 Grp 3 Grp 2 Grp Grp 24 Set-Associative Mapping Example Cache Mapping Example: Direct Mapping MM = 28 words Cache Size = 32 words Block Size = 8 words Each MM block i maps to Cache frame i mod S S = # of sets (groups of cache frames) identifies which group that colored block belongs to CPU Address Analogy V Cache Processor Core Logic Grp Processor Die Set Color Set Member Way Way Way Way Main Memory Group of blocks that each map to different cache blocks but share the same tag

Set 3 Set 2 Set Set 25 Set-Associative path N=8 Total Cache Blocks 4 Sets with 2-ways each V Cache Way Way Way Way Way Way Way Way Way- Cache RAM Addr Addr Way- RAM Set Set Set 2 Set 3 Way- Cache RAM Addr Addr Way- RAM

26 Set-Associative path N=8 Total Cache Blocks 4 Sets with 2-ways each Way- Cache RAM Way- RAM Way- Cache RAM Way- RAM Set Addr Addr Set Addr Addr = Hit or Miss Set = Hit or Miss Valid Set CPU Address Valid

27 Set-Associative Mapping Address Usage Define K = Blocks/Set = Ways If you have N total cache frames, then define number of sets, S, = N total cache blocks / K Blocks per set Define S colors/patterns Log 2 (S) = Log 2 (N/K) set field bits Repeatedly paint MM blocks with those S colors in roundrobin fashion M/S groups will form Log 2 (M/S) tag field bits

28 Set-Associative Mapping path How many TAG RAM s? Key Idea: K-Ways => K comparators (What is a -way Set Associative Mapping) How many entries in the TAG RAM? SET Place tags from different sets that belong to Way in one tag ram, Way in another, etc. How many DATA RAM s? What size is the address field?

29 K-Way Set Associative Mapping If 8386 used K-Way Set-Associative Mapping: MM = 4GB (2 32 ), Cache Size = 64KB (2 6 ), Block Size = (6=2 4 ) bytes = 4 words Number of blocks in MM = 2 32 / 2 4 = 2 28 Number of Cache Block Frames = 2 6 / 2 4 = 2 2 = 496 Set Associativity/Ways (K) = 2 Blocks/Set Number of "colors => 2 2 /2 = 2 Sets => Set field bits 2 28 / 2 = 2 7 = 28K Groups of blocks 7 Field Bits Set Byte 7 Block ID=28 2 2

3 RAM Organizations 8386 2-Way Set-Associative Cache Organization Cache RAM (2K x 8) Cache RAM (2K x 8) Set Addr Set Addr Hit or Miss = Hit or Miss = Valid Valid 7 2 2 Block ID=28

3 RAM Organizations 8386 2-Way Set-Associative Cache Organization 32KB Cache RAM 32KB Cache RAM SET SET Addr Addr 8KB Mem 8KB Mem 8KB Mem 8KB Mem 8KB Mem 8KB Mem 8KB Mem 8KB Mem /BE3 /BE2 /BE /BE /BE3 /BE2 /BE /BE 6 2 Block ID=28 2 2 Byte

32 Set Associative Example Set Byte 8 2 2 Suppose the cache size is 2 2 blocks What is the set size? 4=2 2 /2 blocks/set If the set associativity can be changed, What is the smallest set size? block/set Maximum # of sets = 2 2 Largest Set Field=2-bits, Smallest =6-bits Direct Mapping! What is the largest set size? 2 2 blocks/set Minimum # of sets = Smallest Set Field=-bits, Largest =28-bits Fully Associative!

LIBRARY ANALOGY 33

34 Mapping Functions A mapping function determines the correspondence between MM blocks and cache block frames 3 Schemes Fully Associative Direct Mapping Set-Associative Really just scheme Fully Associative = N-way Set Associative Direct Mapping = -way Set Associative

35 Library Memory Compare MM to a large library Compare cache to your dorm room book shelf Address of a book = -digit ISBN number Block Frame Cache MM Block Assume library has a location on the shelf for all possible books Room for books Dorm Room Shelf Doheny Library ISBN -digit = billion

36 Book Addressing Addresses are not stored in memory (only data) MM Assume library has a location on the shelf for all possible books Block Frame Cache Block No need to print ISBN on the book if each book has a location (find a book by going to its slot using ISBN as index) Room for books Dorm Room Shelf Doheny Library ISBN -digit = billion

37 Fully Associative Analogy Cache stores full Block-ID as a TAG to identify that block When we check a book out and take it to our dorm room shelf Block Frame Cache MM Block Let s allow it to be put in any free slot on the shelf We need to keep the entire ISBN number as a TAG To find a book with a given ISBN on our shelf, we must look through them all Room for books Dorm Room Shelf Doheny Library ISBN -digit = billion

38 Direct Mapping Analogy Cache uses block field to identify the slot in the cache and then stores remainder as TAG to identify that block from others that also map to that slot Assume we number the slots on our book shelf from to 999 When we check a book out and take it to our dorm room shelf we can Use last 3-digits of ISBN to pick the slot to store it If another book is their, take it back to Doheny library (evict it) Store upper 7 digits to identify this book from others that end with the same 3- digits To find a book with a given ISBN on our shelf, we use the last 3-digits to choose which slot to look in and then compare the upper 7-digits N Block Frames Room for books Slot 789 on our shelf Cache Dorm Room Shelf ISBN 23456789 MM Doheny Library Block ISBN -digit = billion (23456789) mod = 789

39 Set Associative Mapping Analogy Cache blocks are divided into groups known as sets. Each MM block is mapped to a particular set but can be anywhere in the set (i.e. all TAGS in the set must be compared) Assume our bookshelf is shelves with room for books each When we check a book out and take it to our dorm room shelf we can Use last -digit of ISBN to pick the shelf but store the book anywhere on the shelf where there is an empty slot Only if the shelf is full do we have to pick a book to take back to Doheny library (evict it) Store upper 9 digits to identify this book from others that end with the same -digit To find a book with a given ISBN on our shelf, we use the last -digits to choose which shelf to look in and then compare upper 9-digits with those of all the books on the shelf N Block Frames shelves of books each Shelf 9 is chosen Cache Dorm Room Shelf MM Doheny Library Block ISBN -digit = billion ISBN 23456789 (23456789) mod = 9

4 Set Associative Mapping Analogy Can we confidently say, We can bring in any (//other) book(s) We can bring in (//other) consecutive book(s) Library analogy: sets each with slots = -way set associative cache N Block Frames shelves of books each Cache Dorm Room Shelf MM Doheny Library Block ISBN -digit = billion Shelf 9 is chosen ISBN 23456789 (23456789) mod = 9