Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Similar documents
Caches. Hiding Memory Access Times

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Advanced Memory Organizations

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

CSE 2021: Computer Organization

CSE 2021: Computer Organization

Chapter 5. Memory Technology

LECTURE 11. Memory Hierarchy

V. Primary & Secondary Memory!

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

ECE468 Computer Organization and Architecture. Memory Hierarchy

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Computer Systems Laboratory Sungkyunkwan University

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Lecture 12: Memory hierarchy & caches

Memory. Lecture 22 CS301

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

14:332:331. Week 13 Basics of Cache

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

CS161 Design and Architecture of Computer Systems. Cache $$$$$

14:332:331. Week 13 Basics of Cache

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

The University of Adelaide, School of Computer Science 13 September 2018

The Memory Hierarchy & Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Course Administration

Transistor: Digital Building Blocks

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Review: Computer Organization

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Memory Hierarchy: Caches, Virtual Memory

A Cache Hierarchy in a Computer System

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

Recap: Machine Organization

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

CS 3510 Comp&Net Arch

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

CMPSC 311- Introduction to Systems Programming Module: Caching

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Memory Hierarchy. Slides contents from:

EE 4683/5683: COMPUTER ARCHITECTURE

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Chapter 6 Objectives

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Page 1. Memory Hierarchies (Part 2)

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Memory Hierarchy: Motivation

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy: The motivation

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

The Memory Hierarchy Cache, Main Memory, and Virtual Memory

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

CpE 442. Memory System

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Introduction to cache memories

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Memory Hierarchy. Slides contents from:

EEC 483 Computer Organization

Page 1. Multilevel Memories (Improving performance using a little cash )

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CSC Memory System. A. A Hierarchy and Driving Forces

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Memory Hierarchy and Caches

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

ECE232: Hardware Organization and Design

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

CS3350B Computer Architecture

Transcription:

PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U

Caches Slides courtesy of Chris evison With modifications by Alyce Brady and athan Sprague

PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U

Memory Main Memory (DRAM) much slower than processors Gap has widened over the past 20+ years, continues to widen Memory comes in different technologies: fastest is also the most expensive Slowest Least expensive Mag. Disk: 5-20 million ns $0.50-$2.00/GB Fastest Most expensive SRAM:.5-5 ns $4000-$10000/GB

Using Memory Temporal locality - likely to reuse data soon WH? Spatial locality - likely to access data close by WH?

Using Memory Temporal locality - likely to reuse data soon (loops) Spatial locality - likely to access data close by (sequential nature of program instructions, data access) Use small amounts of expensive memory, larger amounts of less expensive memory Memory hierarchy used to hide memory access latency

Memory Hierarchy First Level Cache Second Level Cache Main Memory Secondary Memory (Disk)

Using the Memory Hierarchy Check if data is in fast memory (hit); if not (miss), fetch a block from slower memory Block may be a single word or several words Fetch time is miss penalty Hit rate is fraction of memory accesses that are hits; miss rate = 1 - hit rate If locality is high, hit rate gets closer to 1 Effect: memory is almost as fast as fastest level, as big as biggest (slowest) memory

Cache Issues How is cache organized and addressed? When cache is written to, how is memory image updated? When cache is full and new items need to be put in the cache -- what is removed and replaced?

Direct Mapped: Any given memory address can only be found in one cache address (although many memory addresses will map to the same cache address)

Cache Example (Direct Mapped) 256 Byte Cache (64 4-byte words) Each Cache line or block holds one word (4 bytes) Byte in cache is addressed by lowest two bits of address Cache line is addressed by next six bits in address Each Cache line has a tag matching the high 24 bits of the memory address

line address 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 Byte Address tag 00 01 10 11 111110 111111

Address 1100 0000 1010 0000 0000 0110 0010 1000 line address 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 Byte Address tag 00 01 10 11 111110 111111

Address 1100 0000 1010 0000 0000 0110 0010 1000 line address 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 1100 0000 1010 0000 0000 0110 Byte Address tag 00 01 10 11 111110 111111

Cache Access 1. Find Cache line address (bits 2-7) 2. Compare tag to high 24 bits if matched, cache hit» find Byte address, read or write item if not matched, cache miss, go to memory» for a read: retrieve item and write to cache, then use» for a write: write to memory (or to cache, see below) 3. Direct mapped cache -- every address can only go to one cache line! 4. What happens when cache is written to?

Write Policy Write Through Write to memory and to cache Time to write to memory could delay instruction» write buffer can hide this latency Write Back (also called Copy Back) Write only to cache» mark cache line as dirty, using an additional bit When cache line is replaced, if dirty, then write back to memory

Line address 000 001 010 011 100 101 110 111 valid Byte Address tag 00 01 10 11 Desired Address 88 =...00010 110 00 104 =...00011 010 00 88 =...00010 110 00 104 =...00011 010 00 64 =...00010 000 00 12 =...00000 011 00 64 =...00010 000 00 72 =...00010 010 00

Line address 000 001 010 011 100 101 110 111 valid 0000 0000 0000 0000 0000 0010 Byte Address tag 00 01 10 11 Desired Address 88 =...00010 110 00 MISS fetch 104 =...00011 010 00 88 =...00010 110 00 104 =...00011 010 00 64 =...00010 000 00 12 =...00000 011 00 64 =...00010 000 00 72 =...00010 010 00

Line address 000 001 010 011 100 101 110 111 valid 0000 0000 0000 0000 0000 0011 0000 0000 0000 0000 0000 0010 Byte Address tag 00 01 10 11 Desired Address 88 =...00010 110 00 MISS fetch 104 =...00011 010 00 MISS fetch 88 =...00010 110 00 104 =...00011 010 00 64 =...00010 000 00 12 =...00000 011 00 64 =...00010 000 00 72 =...00010 010 00

Line address 000 001 010 011 100 101 110 111 valid 0000 0000 0000 0000 0000 0011 0000 0000 0000 0000 0000 0010 Byte Address tag 00 01 10 11 Desired Address 88 =...00010 110 00 MISS fetch 104 =...00011 010 00 MISS fetch 88 =...00010 110 00 HIT! 104 =...00011 010 00 64 =...00010 000 00 12 =...00000 011 00 64 =...00010 000 00 72 =...00010 010 00

Line address 000 001 010 011 100 101 110 111 valid 0000 0000 0000 0000 0000 0011 0000 0000 0000 0000 0000 0010 Byte Address tag 00 01 10 11 Desired Address 88 =...00010 110 00 MISS fetch 104 =...00011 010 00 MISS fetch 88 =...00010 110 00 HIT! 104 =...00011 010 00 HIT! 64 =...00010 000 00 12 =...00000 011 00 64 =...00010 000 00 72 =...00010 010 00

Line address 000 001 010 011 100 101 110 111 valid 0000 0000 0000 0000 0000 0010 0000 0000 0000 0000 0000 0011 0000 0000 0000 0000 0000 0010 Byte Address tag 00 01 10 11 Desired Address 88 =...00010 110 00 MISS fetch 104 =...00011 010 00 MISS fetch 88 =...00010 110 00 HIT! 104 =...00011 010 00 HIT! 64 =...00010 000 00 MISS fetch 12 =...00000 011 00 64 =...00010 000 00 72 =...00010 010 00

Line address 000 001 010 011 100 101 110 111 valid 0000 0000 0000 0000 0000 0010 0000 0000 0000 0000 0000 0011 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 Byte Address tag 00 01 10 11 Desired Address 88 =...00010 110 00 MISS fetch 104 =...00011 010 00 MISS fetch 88 =...00010 110 00 HIT! 104 =...00011 010 00 HIT! 64 =...00010 000 00 MISS fetch 12 =...00000 011 00 MISS fetch 64 =...00010 000 00 72 =...00010 010 00

Line address 000 001 010 011 100 101 110 111 valid 0000 0000 0000 0000 0000 0010 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 Byte Address tag 00 01 10 11 Desired Address 88 =...00010 110 00 MISS fetch 104 =...00011 010 00 MISS fetch 88 =...00010 110 00 HIT! 104 =...00011 010 00 HIT! 64 =...00010 000 00 MISS fetch 12 =...00000 011 00 MISS fetch 64 =...00010 000 00 HIT! 72 =...00010 010 00

Line address 000 001 010 011 100 101 110 111 valid 0000 0000 0000 0000 0000 0010 0000 0000 0000 0000 0000 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 Byte Address tag 00 01 10 11 Desired Address 88 =...00010 110 00 MISS fetch 104 =...00011 010 00 MISS fetch 88 =...00010 110 00 HIT! 104 =...00011 010 00 HIT! 64 =...00010 000 00 MISS fetch 12 =...00000 011 00 MISS fetch 64 =...00010 000 00 HIT! 72 =...00010 010 00 MISS fetch (wrong tag)

Accelerating Memory Access Wider bus Wider memory access

Accelerating Memory Access How can Cache take advantage of faster memory access? Store more than one word at a time on each line in the cache Any cache miss brings the whole line containing the item into the cache Takes advantage of spatial locality next item needed is likely to be at the next address

Cache with multi-wordline 256 Byte cache -- 64 4-byte words Each block (line) contains four words (16 bytes) 2 bits to address byte in word 2 bits to address word in line Cache contains sixteen four-word blocks 4 bits to address cache block (line) Each cache line has tag field for upper 24 bits of address

Address 1100 0000 1010 0000 0000 0110 0010 1000 line address 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 tag Word Address 00 01 10 11 Byte Address 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11

Address 1100 0000 1010 0000 0000 0110 0010 10 00 line address 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 tag Word Address 00 01 10 11 Byte Address 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11 Hit To Control MUX Data

PC Instruction Cache 4 M U X Registers Sign Ext M U X Sh L 2 Data Cache M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK AS DU DB A D D A L U Mem Addr Mem Data Hit Mem Addr Mem Data Hit

Instruction Cache Hit / Miss Hit or Miss: Instruction is fetched from Cache and placed in Pipeline buffer register PC is latched into Memory Address Register Hit: Control sees hit, execution continues Mem Addr unused

Instruction Cache Hit / Miss Miss Control sees miss, execution stalls PC reset to PC - 4 Values fetched from registers are unused Memory Read cycle started, using Mem Addr Memory Read completes Value stored in cache, new tag written Instruction execution restarts, cache hit

Speeding Up Cache execution time = (execution cycles + stall cycles) cycle time stall cycles = # of instructions miss ratio miss penalty We can improve performance by Reducing miss rate Reducing miss penalty More flexible placement of blocks Miss rate Caching the Cache Miss penalty

Associative Cache Fully Associative Blocks can be stored in any cache line. All tags must be checked on every access. Set Associative Two or more (power of 2) lines for each address More than one item with same cache line address can be in cache Tags for all lines in set must be checked, any match means a hit, if none match, a miss

Two-way set associative cache line address 000 001 010 011 100 101 110 111 tag Word Address 00 01 10 11 Byte Address 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11

Address 1100 0000 1010 0000 0000 01100 010 10 00 000 001 010 011 100 101 110 111 Valid tag Word Address 00 01 10 11 Byte Address 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11 Hit MUX Data

Replacement Policy ecessary to select a line to replace when using associative caches Least Recently Used (LRU) is one possibility

Multi-level Cache Put two (or more) levels of cache between CPU and memory Focus of top level cache is on keeping hit time low Focus of lower level cache is on keeping miss penalty low for top level cache - low miss rate is more important than hit time.

Cache Summary - types Direct Mapped Each line in cache takes one address Line size may accommodate several words Set Associative Sets of lines serve the same address eeds replacement policy for which line to purge when set is full More flexible, but more complex

Cache Summary Cache Hit Item is found in the cache CPU continues at full speed eed to verify valid and tag match Cache Miss Item must be retrieved from memory Whole Cache line is retrieved CPU stalls for memory access

Cache Summary Write Policies Write Through (always write to memory) Write Back (uses dirty bit) Associative Cache Replacement Policy LRU (Least Recently Used) Random