registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Similar documents
registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

ECE 30 Introduction to Computer Engineering

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

14:332:331. Week 13 Basics of Cache

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CSE 2021: Computer Organization

CSE 2021: Computer Organization

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

14:332:331. Week 13 Basics of Cache

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

V. Primary & Secondary Memory!

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Page 1. Multilevel Memories (Improving performance using a little cash )

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Advanced Memory Organizations

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

ECE232: Hardware Organization and Design

CS3350B Computer Architecture

Page 1. Memory Hierarchies (Part 2)

Caches Concepts Review

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

EN1640: Design of Computing Systems Topic 06: Memory System

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 11 Cache. Peng Liu.

Memory Hierarchy Design (Appendix B and Chapter 2)

CS 3510 Comp&Net Arch

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EE 4683/5683: COMPUTER ARCHITECTURE

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Caches. Hiding Memory Access Times

Course Administration

Cache Memory and Performance

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Computer Systems Laboratory Sungkyunkwan University

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Memory. Lecture 22 CS301

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Memory Hierarchy: Motivation

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Introduction to cache memories

The Memory Hierarchy & Cache

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

ECE260: Fundamentals of Computer Engineering

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

LECTURE 11. Memory Hierarchy

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

1. Creates the illusion of an address space much larger than the physical memory

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Computer Architecture CS372 Exam 3

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

COMP 3221: Microprocessors and Embedded Systems

Improving our Simple Cache

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

ECE331: Hardware Organization and Design

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

EE 457 Unit 7b. Main Memory Organization

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Main Memory Supporting Caches

Modern Computer Architecture

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Memory Hierarchy: Caches, Virtual Memory

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Review : Pipelining. Memory Hierarchy

LECTURE 10: Improving Memory Access: Direct and Spatial caches

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

LECTURE 12. Virtual Memory

Main Memory (Fig. 7.13) Main Memory

Transcription:

Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

12 3 CMPE110 Spring 2005 A. Di Blas Caches hierarchy Memory Rel. speed: 1 registers MEMORY ADDRESS data not in registers data 1-2 on-chip cache is it here? N 2-5 off-chip cache is it here? N 10-20 main memory: real address space part of virtual addr. sp. is it here? N 1000-100,000 disk: rest of virtual addr. sp. files, etc. is it here? N? long-term storage devices get it

12 4 CMPE110 Spring 2005 A. Di Blas Caches Memory location CPU DISK MAIN MEMORY L2 L1 INST L1 DATA REGISTERS

block: amount of information transferred (in bytes or words) hit: the block is present miss: the block is not present miss penalty: time (in clock cycles) to fetch a block from the lower level 12 5 CMPE110 Spring 2005 A. Di Blas Caches Basic concepts data locality: temporal locality spatial locality hit rate: fraction of times a requested block is found hit time: time to fetch a block that is present miss rate: fraction of times a requested block is not present rate = 100% - hit rate) (miss

Cache mappings fully-associative cache 12 6 CMPE110 Spring 2005 A. Di Blas Caches Size of cache < size of main memory CPU cache main memory direct-mapped cache set-associative cache

12 7 CMPE110 Spring 2005 A. Di Blas Direct-mapped caches Direct-mapped cache cache BLOCK address (cache INDEX) 000 001 010 011 100 101 110 111 Each memory block is mapped to exactly one block in the cache. memory BLOCK address 00001 00101 01001 01101 10001 10101 11001 11101

The cache index 12 8 CMPE110 Spring 2005 A. Di Blas Direct-mapped caches Many different memory blocks map to a single cache block which block? Use the memory adress' lower bits to index the cache. cache index = (memory block address) % (cache size in blocks) 1: 32-block main memory, 8-block cache (we consider block Example addresses). The memory block address is... bits. To index the cache we need... bits the lower... bits of the memory block address. The memory block 01001 maps to the cache location... The memory block 10110 maps to the cache location...

12 9 CMPE110 Spring 2005 A. Di Blas Direct-mapped caches 2: 128-byte main memory, 8-block cache, 4-byte (= 1 word) cache Example size (we consider byte addresses). block... -bit memory byte address... -bit cache (block) index... bits to address the byte within the block The memory addresses 0100100, 0100101, 0100110, and 0100111 all map memory byte address 0101010 0101001 0101000 0100111 0100110 0100101 0100100 0100011 0100010 cache index 000 001 010 011 100 101 110 111 cache block (4 bytes) to the same cache block...

12 10 CMPE110 Spring 2005 A. Di Blas Direct-mapped caches The Tag field Many different memory blocks map to a single cache block how do we know which memory block is in the cache block? To each cache line we add a tag that contains the remaining part (upper bits) of the address 3: 32-block main memory, 8-block cache. Memory blocks 00001, Example 10001, and 11001 all map to the same cache block... 01001, tag memory block address 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 000 001 010 011 100 101 110 111 cache index cache block (4 bytes)

The Valid bit 12 11 CMPE110 Spring 2005 A. Di Blas Direct-mapped caches The CPU performs many different tasks, and the memory contents change how do we know if a cache block is good"? To each cache line we add a valid bit to indicate whether the content of the block corresponds to what the CPU is actually looking for. For instance, after a reset, all valid bits are reset - no block contains useful information.

12 12 CMPE110 Spring 2005 A. Di Blas Direct-mapped caches memory byte address tag 31 30 29... 13 12 11 10... 3 2 1 0 index CPU 1-word block, direct-mapped cache index 0 1 2......... 1020 1021 1022 1023 V TAG DATA DATA HIT mem. address [b] cache line size [b] bits for index cache data size [B] bits for tag total cache size [b]

12 13 CMPE110 Spring 2005 A. Di Blas Direct-mapped caches Cache trace with block address 32-block memory, 8-block cache, the memory address is a block address Address dec bin Hit/Miss INDEX V TAG DATA 22 26 22 18 26 18 26 22 10110 11010 10110 10010 11010 10010 11010 10110 000 001 010 011 100 101 110 111

12 14 CMPE110 Spring 2005 A. Di Blas Direct-mapped caches Cache trace with byte address 256-byte memory, 32-byte cache, 4-byte cache block, memory byte addressing Address Hit/Miss dec bin INDEX V TAG 89 232 90 8 91 92 232 7 01011001 11101000 01011010 00001000 01011011 01011100 11101000 00000111 000 001 010 011 100 101 110 111 DATA

read misses write misses 12 15 CMPE110 Spring 2005 A. Di Blas Cache reads and writes Cache reads and writes In our CPU, Instruction Memory and Data Memory are actually cache memories. On a memory access, hits are straightforward to handle. Misses are more complex:

stall the CPU restart the instruction 12 16 CMPE110 Spring 2005 A. Di Blas Cache reads and writes Read misses For instructions: stall the CPU send the original PC to memory (current PC-4) and wait write the cache entry (including tag and valid bit) restart the instruction For data: send the address to memory and wait write the cache entry (including tag and valid bit)

12 17 CMPE110 Spring 2005 A. Di Blas Cache reads and writes Write misses What is a write miss? In a 1-word-block write-through cache, writes always hit. We do not need to know what was in the memory location, since the CPU is overwriting it anyway. Problem: inconsistency Solutions: write-through write-back

12 18 CMPE110 Spring 2005 A. Di Blas Cache reads and writes Write-through Every time, write both the cache and the memory: write buffer CPU CACHE MEMORY simple slow (write buffer)

12 19 CMPE110 Spring 2005 A. Di Blas Cache reads and writes Write-back Write only the cache. Write the entire block back into the memory only when R/ W R dec 22 Address bin 10110 H/ M H the block needs to be replaced (dirty bit). Cache index 110 V T D DATA CPU CPU W Hit CACHE CACHE write buffer W W 22 22 10110 10110 W 14 01110 W 14 01110 CPU CPU FLUSH BLOCK CACHE CACHE MEMORY R 22 10110 CPU CACHE CPU CACHE CPU CPU CPU FLUSH BLOCK CACHE CACHE CACHE

12 20 CMPE110 Spring 2005 A. Di Blas Multi-word caches Multi-word caches Using cache blocks larger than one word takes advantage of spatial locality. memory byte address 4-GB memory, 64-KB direct-mapped cache with 4-word data blocks (16-bytes) CPU index word offset data word hit tag V tag cache data block

12 21 CMPE110 Spring 2005 A. Di Blas Multi-word caches Exercise What is the total size in bits of the cache in the previous slide?

12 22 CMPE110 Spring 2005 A. Di Blas Multi-word caches Hits/misses in a multi-word cache Just like the read misses on a single-word cache, except that the entire Read: is fetched. block We can not just write the word, tag, and valid bit without verifying Write: the block is the actual block we want to write to, since more than one whether memory block maps to the same cache block. We need to compare the tag for writes too. the tags match: we can write the word the tags do not match: we need to read the block from memory and then write the word

12 23 CMPE110 Spring 2005 A. Di Blas Multi-word caches Cache block size and miss rate up to a certain point, cache miss rate decreases with increasing block size after a certain point, cache miss rate increases with increasing block size spatial locality decreases with block size the miss penalty increases with block size (COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED)

c) time for transferring each word 12 24 CMPE110 Spring 2005 A. Di Blas Multi-word caches Miss penalty (= additional clock cycles) Has three components: a) sending the address to memory b) latency to initiate the memory transfer Example: a) = 1 clock cycle, b) = 15 clock cycles, c) = 1 clock cycle With a 4-word block cache and a 1-word memory bus, the the miss penalty on a standard DRAM is: On an SDRAM or with an interleaved memory organization is:

Memory bandwidth 12 25 CMPE110 Spring 2005 A. Di Blas Multi-word caches If a single transfer to/from memory can transfer multiple words at a time, the miss penalty decreases CPU CPU CPU bus bus bus cache MUX/DEMUX bus bus bus MUX/DEMUX bus bus bus bus cache bus cache bus MEM MEM MEM Miss penalty for a 2-word block cache with a 2-word memory bus: Miss penalty for a 4-word block cache with a 4-word memory bus:

Cache associativity fully associative 12 26 CMPE110 Spring 2005 A. Di Blas Cache associativity What if the CPU keeps accessing two (or more) variables that map to the same location in a direct-mapped cache? More sophisticated strategy: n-way set-associative caches. direct-mapped ( 1-way set associative") n-way set associative

Cache associativity Two-way set associative cache 00001 00101 01001 01101 10001 10101 11001 11101 00 01 10 11 cache SET index 12 27 CMPE110 Spring 2005 A. Di Blas

cache SET index 0 1 Cache associativity Four-way set associative cache 00001 00011 00101 00111 01001 01011 01101 01111 10001 10011 10101 10111 11001 11011 11101 11111 12 28 CMPE110 Spring 2005 A. Di Blas

12 29 CMPE110 Spring 2005 A. Di Blas Cache associativity Eight-way set associative cache fully-associative cache any block can go anywhere

Cache associativity Direct-mapped cache 1-way set associative cache 00001 01001 10001 11001 000 001 010 011 100 101 110 111 cache line index 12 30 CMPE110 Spring 2005 A. Di Blas

reduces the miss rate 12 31 CMPE110 Spring 2005 A. Di Blas Cache associativity Pros and cons of increasing cache associativity Advantages: Disadvantages: requires more hardware requires a replacement policy Block replacement policy: Least Recently Used (LRU) or random implemented in hardware

12 32 CMPE110 Spring 2005 A. Di Blas Cache associativity Exercise 1 For an 8-line, write-through, 2-way set-associative cache with LRU replacement and 1-word data block, trace the following sequence of addresses: block address dec binary H/M 23 00010111 18 00010010 196 11000100 63 00111111 79 01001111 18 00010010 199 11000111 165 10100101

a) how many lines are in the cache? c) what is the total cache size in bits? d) diagram a cache lookup 12 33 CMPE110 Spring 2005 A. Di Blas Cache associativity Exercise 2 A computer system has 32-bit addresses and a 64-KB direct-mapped, write-back cache with 8-byte data block lines. how many bits total (including cache management bits) are in each line, b) minimum?

a) # of lines: b) # of bits per line c) total cache size d) cache lookup: 12 34 CMPE110 Spring 2005 A. Di Blas Cache associativity Solution: 31 24 16 8 0

a) how many sets are in the cache? b) how many lines are in the cache? d) what is the total cache size in bits? e) diagram a cache lookup 12 35 CMPE110 Spring 2005 A. Di Blas Cache associativity Exercise 3 Suppose the 64-KB cache in Exercise 2 was instead 2-way set associative with 8-byte lines. how many bits total (including cache management bits) are in each line, c) minimum?

a) # of sets b) # of lines c) # of bits per line d) total cache size e) cache lookup: 12 36 CMPE110 Spring 2005 A. Di Blas Cache associativity Solution: 31 24 16 8 0

12 37 CMPE110 Spring 2005 A. Di Blas Caches and performance Caches and performance 4: a computer has a CPI of 1.0 when there are no cache misses, Exercise a 100 MHz clock. Each instruction has on average 0.4 data memory and references. For each cache miss the instruction takes an additional 9 clock cycles to complete. what are the CPI 100% and the MIPS 100% rating with a cache and an 100% hit rate? unrealistic what are the CPI NOCACHE and the MIPS NOCACHE rating without a cache? what are the CPI 90 85 and themips 90 85 rating with a cache and a 90% rate on instructions and an 85% hit rate on data? hit

CPI NOCACHE = MIPS NOCACHE = CPI 90 85 = MIPS 90 85 = 12 38 CMPE110 Spring 2005 A. Di Blas Caches and performance Solution: CPI 100% = MIPS 100% =

Ex 7.1 to 7.9, 7.13 to 7.17, 7.20, 7.21 Ex 7.27 Hint: no need to know the CPI. Remember: the miss penalty is in number of Ex 7.28 Hints: i) no need to know the CPI, ii) since the three machines only differ in array[131], and array[132] are stored. Ex 7.2-7.4, 7.6-7.8, 7.9, 7.12, 7.16, 7.17-7.19, 7.25-7.27 Ex 7.32, 7.33, 7.35 are equivalent to old 7.27, 7.28, and 7.29 respectively. 12 39 CMPE110 Spring 2005 A. Di Blas Homework Recommended exercises Second Edition: clock cycles. Note: the three machines are identical except for the cache. the cache system (and in the clock cycle time, of course) we only need to consider the total miss cycles per instruction. Ex 7.29 (especially interesting for CS) Hint: find in what cache blocks array[0], Third Edition: