Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Similar documents
Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Chapter Seven Morgan Kaufmann Publishers

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Advanced Memory Organizations

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caches and Memory Deniz Altinbuken CS 3410, Spring 2015

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

EEC 483 Computer Organization

COMPUTER ORGANIZATION AND DESIGN

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EN1640: Design of Computing Systems Topic 06: Memory System

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

EN1640: Design of Computing Systems Topic 06: Memory System

ECE ECE4680

Virtual Memory. Main memory is a CACHE for disk Advantages: illusion of having more physical memory program relocation protection.

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 10: Improving Memory Access: Direct and Spatial caches

V. Primary & Secondary Memory!

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

EECS 322 Computer Architecture Improving Memory Access: the Cache

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

CS161 Design and Architecture of Computer Systems. Cache $$$$$

LECTURE 11. Memory Hierarchy

CPSC 330 Computer Organization

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Improving Memory Access The Cache and Virtual Memory

Caches. See P&H 5.1, 5.2 (except writes) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Page 1. Multilevel Memories (Improving performance using a little cash )

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Technologies. Technology Trends

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

The University of Adelaide, School of Computer Science 13 September 2018

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

EECS 322 Computer Architecture Superpipline and the Cache

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

LECTURE 12. Virtual Memory

Memory Hierarchy: Caches, Virtual Memory

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Caches and Memory. Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.13, 5.15, 5.17

Memory Hierarchy Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory.

Caching Basics. Memory Hierarchies

CS3350B Computer Architecture

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Background. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CMPSC 311- Introduction to Systems Programming Module: Caching

Handout 4 Memory Hierarchy

ECE331: Hardware Organization and Design

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

A cache is a small, fast memory which is transparent to the processor. The cache duplicates information that is in main memory.

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Memory. Memory Technologies

The Memory Hierarchy & Cache

Memory hierarchy review. ECE 154B Dmitri Strukov

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Solutions for Chapter 7 Exercises

HY225 Lecture 12: DRAM and Virtual Memory

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Introduction to OpenMP. Lecture 10: Caches

Memory Hierarchies 2009 DAT105

Memory Hierarchy. Slides contents from:

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 2)

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

1. Creates the illusion of an address space much larger than the physical memory

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Memory. Lecture 22 CS301

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Transcription:

Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed) very small but slower than SRA (factor of 5 to ) 2

Exploiting emory Hierarchy Users want large and fast memories! SRA access times are.5 5ns at cost of $4 to $, per GB. DRA access times are 5-7ns at cost of $ to $2 per GB. Disk access times are 5 to 2 million ns at cost of $.5 to $2 per GB. 24 Try and give it to them anyway build a memory hierarchy CPU Levels in the memory hierarchy Level Level 2 Increasing distance from the CPU in access time Level n Size of the memory at each level 3 Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon. Why does code have locality? Our initial focus: two levels (upper, lower) block: minimum unit of data hit: data requested is in the upper level miss: data requested is not in the upper level 4

Cache Two issues: How do we know if a data item is in the cache? If it is, how do we find it? Our first example: block size is one word of data "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level 5 Direct apped Cache apping: address is modulo the number of blocks in the cache Cache emory

Direct apped Cache For IPS: H it Addre ss (showing bit positions) 3 3 3 2 2 Byte offset 2 T a g In d e x D ata In de x 2 V alid T ag D a ta 2 22 23 2 3 2 What kind of locality are we taking advantage of? 7 Direct apped Cache Taking advantage of spatial locality: Address (showing bit positions) 3 4 3 5 2 Hit Tag 4 Byte offset Index Block offset Data bits 52 bits V Tag Data 25 entries 32 32 32 = ux 32

Hits vs. isses Read hits this is what we want! Read misses stall the CPU, fetch block from memory, deliver to cache, restart Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses: read the entire block into the cache, then write the word 9 Hardware Issues ake reading multiple words easier by using banks of memory CPU CPU CPU Cache ultiplexor Cache Cache Bus Bus Bus emory emory bank emory bank emory bank 2 emory bank 3 emory b. Wide memory organization c. Interleaved memory organization a. One -word -wide memory organization It can get a lot more complicated...

Performance Increasing the block size tends to decrease miss rate: iss rate 4% 35% 3% 25% 2% 5% % 5% % 4 KB KB KB 4 KB 25 KB Use split caches because there is more spatial locality in code: Block size (bytes) 4 25 Program Block size in words Instruction miss rate Data miss rate Effective combined miss rate gcc.% 2.% 5.4% 4 2.%.7%.9% spice.2%.3%.2% 4.3%.%.4% Performance Simplified model: execution time = (execution cycles + stall cycles) cycle time stall cycles = # of instructions miss ratio miss penalty Two ways of improving performance: decreasing the miss ratio decreasing the miss penalty What happens if we increase block size? 2

Set Associative Caches Basic Idea: a memory block can be mapped to more than one location in the cache Cache is divided into sets Each memory block is mapped to a particular set Each set can have more than one block Number of blocks in set = associativity of cache If a set has only one block, then it is a direct-mapped cache I.e. direct mapped caches have a set associativity of Each memory block can be placed in any of the blocks of the set to which it maps 3 Direct mapped cache: block N maps to ( N mod num of blocks in cache) Set associative cache: block N maps to set (N mod num of sets in cache) Example below shows placement of block whose address is 2 Direct mapped Block # 2 3 4 5 7 Set associative Set # 2 3 Fully associative Data Data Data Tag 2 Tag 2 Tag 2 Search Search Search 4

Decreasing miss ratio with associativity One-way set associative (direct mapped) Block Tag Data 2 3 4 5 7 Two-way set associative Set 2 3 Tag Data Tag Data Set Four-way set associative Tag Data Tag Data Tag Data Tag Data Eight-way set associative (fully associative) Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Compared to direct mapped, give a series of references that: results in a lower miss ratio using a 2-way set associative cache results in a higher miss ratio using a 2-way set associative cache assuming we use the least recently used replacement strategy 5 An implementation A d d re s s 3 3 2 9 3 2 2 2 In d e x 2 V T a g Data V Tag Data V Tag Data V Tag D ata 2 5 3 2 5 4 2 5 5 2 2 3 2 4 - to - m u ltip le x o r H it D a ta

Set Associative Caches Advantage: iss ratio decreases as associativity increases Disadvantage Extra time for associative search 7 Block Replacement Policies What block to replace on a cache miss? We have multiple candidates (unlike direct mapped caches) Random FIFO (First In First Out) LRU (Least Recently Used) Typically, cpus use Random or Approximate LRU because easier to implement in hardware

Example Cache size = 4 one word blocks Replacement Policy = LRU Sequence of memory references,,,, Set associativity = 4 (Fully Associative); Number of Sets = Address Hit/iss Set Set Set Set H H 9 Example cont d Cache size = 4 one word blocks Replacement Policy = LRU Sequence of memory references,,,, Set associativity = 2 ; Number of Sets = 2 Address Hit/iss Set Set Set Set H 2

Example cont d Cache size = 4 one word blocks Replacement Policy = LRU Sequence of memory references,,,, Set associativity = (Direct apped Cache) Address Hit/iss 2 3 2 Performance 5% 2% 9% % 3% One-way KB 2 KB 4 KB KB KB 32 KB Two-way Associativity 4 KB 2 KB Four-way Eight-way 22

Decreasing miss penalty with multilevel caches Add a second level cache: often primary cache is on the same chip as the processor use SRAs to add another cache above primary memory (DRA) miss penalty goes down if data is in 2nd level cache Example: CPI of. on a 5hz machine with a 5% miss rate, 2ns DRA access Adding 2nd level cache with 2ns access time decreases miss rate to 2% Using multilevel caches: try and optimize the hit time on the st level cache try and optimize the miss rate on the 2nd level cache 23 Understanding the three sources of cache misses 4% 2% Compulsory, Capacity, Conflict isses % iss rate per type % % 4% 2% Capacity % 2 4 32 4 2 Cache size (KB) One-way Two-way Four-way Eight-way 24

odern Systems 25 odern Systems Things are getting complicated! 2

Some Issues Processor speeds continue to increase very fast much faster than either DRA or disk access times,, Performance, CPU emory Design challenge: dealing with this growing disparity Prefetching? 3 rd level caches and more? emory design? Year 27 Virtual emory 2

hex Text Virtual emory ain memory can act as a cache for the secondary storage (disk) Virtual addresses Address translation Physical addresses Disk addresses Advantages: illusion of having more physical memory program relocation protection 29 Recall: Each IPS program has an address space of size 2 32 bytes $sp 7fff ffff hex Stack Dynamic data $gp hex Static data pc 4 hex Reserved 3

Pages: virtual memory blocks Page faults: the data is not in memory, retrieve it from disk huge miss penalty, thus pages should be fairly large (e.g., 4KB) reducing page faults is important (LRU is worth the price) can handle the faults in software instead of hardware using write-through is too expensive so we use writeback Virtual address 3 3 29 2 27 5 4 3 2 9 3 2 Virtual page number Page offset Translation 29 2 27 5 4 3 2 9 3 2 Physical page number Page offset Physical address 3 Page Tables Virtual page number Page table Physical page or Valid disk address Physical memory Disk storage 32

Page Tables Page table register Virtual address 3 3 29 2 27 5 4 3 2 9 3 2 Virtual page number Page offset 2 2 Valid Physical page number Page table If then page is not present in memory 29 2 27 5 4 3 2 9 3 2 Physical page number Page offset Physical address 33 aking Address Translation Fast A cache for address translations: translation lookaside buffer Virtual page number Valid Tag TLB Physical page address Physical memory Page table Physical page Valid or disk address Disk storage 34

TLBs and caches Virtual address TLB access TLB miss exception No TLB hit? Yes Physical address No Write? Yes Try to read data from cache No Write access bit on? Yes Cache miss stall No Cache hit? Yes Write protection exception Write data into cache, update the tag, and put the data and the address into the write buffer Deliver data to the CPU 35