Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Similar documents
LECTURE 11. Memory Hierarchy

Lecture 12: Memory hierarchy & caches

Chapter 5. Memory Technology

CSE 2021: Computer Organization

CSE 2021: Computer Organization

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

MEMORY. Objectives. L10 Memory

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Memory Hierarchies &

Advanced Memory Organizations

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Cache and Memory. CS230 Tutorial 07

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

V. Primary & Secondary Memory!

ECE468 Computer Organization and Architecture. Memory Hierarchy

EN1640: Design of Computing Systems Topic 06: Memory System

EE 457 Unit 7a. Cache and Memory Hierarchy

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Welcome to Part 3: Memory Systems and I/O

Memory. Objectives. Introduction. 6.2 Types of Memory

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

The Memory Hierarchy Cache, Main Memory, and Virtual Memory

CS3350B Computer Architecture

Computer Architecture Memory hierarchies and caches

Chapter 6 Objectives

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CMPSC 311- Introduction to Systems Programming Module: Caching

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

EN1640: Design of Computing Systems Topic 06: Memory System

Solutions for Chapter 7 Exercises

Computer Systems Laboratory Sungkyunkwan University

Memory Hierarchy: Motivation

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Memory Hierarchy: The motivation

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Cache introduction. April 16, Howard Huang 1

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Chapter 5. Large and Fast: Exploiting Memory Hierarchy


Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Introduction to OpenMP. Lecture 10: Caches

1. Creates the illusion of an address space much larger than the physical memory

Locality and Data Accesses video is wrong one notes when video is correct

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality. Linda Shapiro Winter 2015

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Caching Basics. Memory Hierarchies

CMPSC 311- Introduction to Systems Programming Module: Caching

Direct Mapped Cache Hardware. Direct Mapped Cache. Direct Mapped Cache Performance. Direct Mapped Cache Performance. Miss Rate = 3/15 = 20%

Caches. Hiding Memory Access Times

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Memory. Lecture 22 CS301

Memory Hierarchy: Caches, Virtual Memory

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Key Point. What are Cache lines

ECE331: Hardware Organization and Design

ECE232: Hardware Organization and Design

Memory hierarchies: caches and their impact on the running time

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Question?! Processor comparison!

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Cache Policies. Philipp Koehn. 6 April 2018

Recap: Machine Organization

Memory Hierarchies 2009 DAT105

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality. Lauren Milne Spring 2015

CMPT 300 Introduction to Operating Systems

Assignment 1 due Mon (Feb 4pm

CSC Memory System. A. A Hierarchy and Driving Forces

The Memory Hierarchy. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Transcription:

Basic Memory Hierarchy Principles Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Cache memory idea Use a small faster memory, a cache memory, to store recently used data instead of always accessing the slow main memory CPU Data Cache Data Main memory Fast: 1 ns Slow: 7 ns Minne 2

Increased average performance Using no cache: All accesses must go to main memory Average access time: 7 ns Using cache: Cache hit: 1 ns access time Cache miss: 7 ns access time If 95% of all accesses hit then: Average access time = 0,95 1 + 0,05 7 = 1,3 ns 95% - is that realistic? Minne 3

Locality Instructions and data are not randomly referenced! Because of: Loops, subroutines, Data structure, arrays, 200: lw $3, 0($2) 204: lw $4, 100($2) 208: slti $6, $2, 100 212: add $5, $3, $4 216: sw $5, 100($2) 220: addi $2, $2, 4 224: bne $6, $0, -28 Address Time Minne 4

Locality Spatial locality If one address is accessed, it is probable that a nearby address soon will be accessed Temporal locality If one address is accessed, it is probable that the same address soon will be accessed again Locality can be exploited to keep a small subset of data and instructions that are likely to be accessed soon in small and fast storage Minne 5

Memory System Performance P Facts Large is slow, small is fast Memory performance cannot keep up with processors Average access time = h 1 T 1 + h 2 T 2 = h 1 T 1 + (1-h 1 )T 2 h x = hit rate = probability of finding data in memory x (1-h x ) = m x = miss rate for memory x T x = hit time = access time for memory x (T y -T x ) = miss penalty for memory x M 1 M 2 Minne 6

Memory Hierarchy Address Address Address CPU Data Cache Data Main memory Data Secondary memory Registers 128-512 B cycle time SRAM 8-2048 KB a few ns h=80-99.9% DRAM 32-512 MB 10:s of ns h>99.9999% Magnetic disk >0.5 GB milliseconds h=100% Minne 7

Levels CPU Address Address Address Cache Main memory Data Data Data Secondary memory Division into levels Highest level Closest to CPU Smallest Fastest Middle levels One or more levels (sometimes more cache levels) Lowest level Farthest from CPU Largest Slowest Minne 8

The Memory Hierarchy is Normally a True Hierarchy CPU Address Address Address Cache Main memory Data Data Data Secondary memory Each level contains a subset of the contents of lower levels Cache contains subset of main memory < Main memory contains subset of secondary memory < Secondary memory contains all addressable data and instructions Minne 9

Blocks Address Address Address CPU 4-8 B 16-256 B 4-64 KB Data are transferred in block of different sizes at different levels. Block sizes are adapted to exploit spatial locality and to optimize block transfer times. Minne 10

Using Hierarchical Memory CPU CPU needs a word or byte Secondary memory always contains all addressable data and instrustions Minne 11

Using Hierarchical Memory Address CPU CPU needs a word or byte Word address is sent to highest level (cache) Minne 12

Using Hierarchical Memory CPU CPU needs a word or byte No block containing the word is found Minne 13

Using Hierarchical Memory Address CPU CPU needs a word or byte Required block address is sent to next lower level (main memory) Minne 14

Using Hierarchical Memory CPU CPU needs a word or byte No larger block containing the needed block is found Minne 15

Using Hierarchical Memory Address CPU CPU needs a word or byte Address to required larger block is sent to next lower level (secondary memory) Minne 16

Using Hierarchical Memory CPU CPU needs a word or byte Required block is found! Minne 17

Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level. Old block is thrown out to give room. Minne 18

Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level. Old block is thrown out to give room. Minne 19

Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level Minne 20

Using Hierarchical Memory CPU CPU needs a word or byte Required block is copied to nearest higher level Minne 21

Using Hierarchical Memory CPU The needed word is copied to the CPU Cache contains block with copy of the word Main memory contains block with copy of the cache block Secondary memory contains block with copy of main memory block Minne 22

Using Hierarchical Memory Address CPU CPU needs same word again Minne 23

Using Hierarchical Memory Address CPU Required word is copied to the CPU Required word is found directly in cache Thanks to locality, data and instructions are usually found close to the CPU! Minne 24

Four Important Questions For each level in the memory hierarchy, the following questions need to be answered: Where to place a block? How to find a block? Which block to replace? What happens on a write? Minne 25

Cache Memory The memory level(s) between main memory and the CPU Has a fixed number of block places Each block place can contain blocks from a subset of all blocks in main memory The address of a block determines which block place it can be stored in Minne 26

Cache Addressing Address from CPU typically 32-64 bits Points out one byte 31 CPU address 0 The word that contains this byte is fetched from memory 31 Word address (30 bits) Byte in word 2 0 4 bytes/word Minne 27

Cache Addressing Find the right cache block for the addressed word Block size typically 1-64 words (4-256 B) 31 Block number/address 4 Byte offset 0 16 bytes/block Typically 8-2048 KB of total cache capacity, 256-16384 block places Minne 28

Cache Addressing and then find the corresponding block place Least significant part of block address is used as index to a block place 31 14 Index 4 Byte offset 0 1K block places 16 bytes/block Minne 29

Cache Addressing The rest of the block address (tag) is stored together with the block, because indexes are shared 31 Tag 14 Index 4 Byte offset 0 256K blocks mapped to the same place 1K block places 16 bytes/block Many blocks compete for the same block place Minne 30

Cache Memory Implementation 31 Tag 14 Index 4 Offset 0 Wrd Byte Index 0 1 2 selected 1021 1022 1023 Valid Tag Data (block=4 words) = & Hit Data (1 word) Minne 31

What Happens on a Cache Miss? Any valid block already at the searched block s index is thrown out (sometimes it must be copied back to main memory) The searched block is read from main memory and written into the cache memory at the searched address index, and needed word are delivered to the CPU This procedure takes much longer than if the block had been found in the cache immediately (cache hit) Minne 32

Block num A Problem with Caches 00001 Memory 00101 01001 01101 10001 Block place 000 001 010 011 100 101 110 111 Cache 10101

Associativity Many blocks (256K in the example) share index and compete for the same block place 31 Tag 14 Index 4Wrd Byte0 Problem if several such blocks are needed simultaneously! Solution: Use multiple parallel cache memories! each index points out a set of block places each block can be stored anywhere within its set several blocks with the same index can be used simultaneously number of block places per set = associativity Minne 34

Associativity Direct mapped cache 1 block/set (as described earlier) k-way set associative cache k blocks/set (equivalent to k parallel caches) typically k=2 or 4 Fully associative cache all block places are part of a single set expensive and unusual except for certain small and specialized caches Minne 35

Two-Way Set Associative Cache Two parallel direct mapped caches: Index 0 1 2 1021 1022 1023 Valid Tag Data (block=4 words) Index 0 1 2 1021 1022 1023 Valid Tag Data (block=4 words) Minne 36

Two-way set associative cache 31 Tag 14 Index/Set 4 Offset 0 Wrd Byte Index Valid Tag Data (block=4 words) Index Valid Tag Data (block=4 words) 0 1 2 selected 1021 1022 1023 0 1 2 selected 1021 1022 1023 = = & & Minne 37

Two-way set associative cache 31 Tag 14 Index/Set 4 Offset 0 Wrd Byte Index Valid Tag Data (block=4 words) 0 1 2 selected 1021 1022 1023 = & Index Valid Tag Data (block=4 words) 0 1 2 selected 1021 1022 1023 = & =1 Hit Data (1 word) Minne 38

Which Block to Replace? Replacement algorithms LRU (Least Recently Used) throw out the block that has not been accessed for the longest time (requires extra Referenced bits) because of temporal locality, this is the best approximation of saving the blocks that will soon be accessed again Random randomly choose a block Minne 39

What Happens on a Write? Two main strategies: Write through all written to the cache, is also written directly to the main memory (via a Store buffer ) Write back (copy back) write new data only in the cache memory when a block is thrown out, it is written to the main memory if it was modified in the cache memory (requires and extra Dirty bit per block) Minne 40

Cache Memory Performance Hit rate (miss rate) depends primarily on: Cache memory size Block size Associativity Average access time also affected by hit time and miss penalty NB! Methods the improve hit rate can easily affect access times so that the overall performance effect may be negative Minne 41

Virtual Memory Extends the memory hierarchy to include main memory and mass storage Main memory as cache for data in secondary memory Large block sizes (kilobytes), often called pages Physical addresses for main memory Virtual addresses for secondary memory Virtual-physical translation performed in processor Allows different processes to use independent address spaces => possibility for protection More about this in Appendix C (C.4 and C.5) and later in the course