Memory Hierarchy Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory.

Similar documents
Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Unit 2. Chapter 4 Cache Memory

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory

TK2123: COMPUTER ORGANISATION & ARCHITECTURE. CPU and Memory (2)

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week)

WEEK 7. Chapter 4. Cache Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Chapter 4 Main Memory

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Lecture 2: Memory Systems

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Virtual or Logical. Logical Addr. MMU (Memory Mgt. Unit) Physical. Addr. 1. (50 ns access)

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Physical characteristics (such as packaging, volatility, and erasability Organization.

LSN 7 Cache Memory. ECT466 Computer Architecture. Department of Engineering Technology

LECTURE 10: Improving Memory Access: Direct and Spatial caches

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Advanced Memory Organizations

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

A Framework for Memory Hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

CSE Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: Due: Problem 1: (10 points)

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Cray XE6 Performance Workshop

Cache Memory Mapping Techniques. Continue to read pp

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

Introduction to OpenMP. Lecture 10: Caches

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Assignment 1 due Mon (Feb 4pm

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Chapter 4 - Cache Memory

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

EE 4683/5683: COMPUTER ARCHITECTURE

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Memory Hierarchies &

Page 1. Memory Hierarchies (Part 2)

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Caching Basics. Memory Hierarchies

Page 1. Multilevel Memories (Improving performance using a little cash )

The Memory Hierarchy & Cache

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Chapter Seven Morgan Kaufmann Publishers

Memory Management! Goals of this Lecture!

MIPS) ( MUX

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

12 Cache-Organization 1

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

EEC 483 Computer Organization

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

1. Creates the illusion of an address space much larger than the physical memory

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Memory Hierarchy: Caches, Virtual Memory

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

EE 457 Unit 7a. Cache and Memory Hierarchy

LECTURE 11. Memory Hierarchy

The memory gap. 1980: no cache in µproc; level cache on Alpha µproc

Logical Diagram of a Set-associative Cache Accessing a Cache

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

www-inst.eecs.berkeley.edu/~cs61c/

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Introduction. Memory Hierarchy

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Key Point. What are Cache lines

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Systems Programming and Computer Architecture ( ) Timothy Roscoe

EN1640: Design of Computing Systems Topic 06: Memory System

CS3350B Computer Architecture

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Performance metrics for caches

Memory Hierarchy. Slides contents from:

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

COSC 6385 Computer Architecture - Memory Hierarchies (I)

ECE ECE4680

Memory Hierarchy. Slides contents from:

CS Computer Architecture

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS3350B Computer Architecture

CS152 Computer Architecture and Engineering Lecture 17: Cache System

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

Transcription:

Memory Hierarchy Goal: Fast, unlimited storage at a reasonable cost per bit. Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory. Cache - 1

Typical system view of the memory hierarchy Figure 4.3 Cache and Main Memory Cache -

Table 4.3 Cache Sizes of Some Processors Processor Type Year of Introduction L1 L L3 IBM 360/85 Mainframe 1968 to 3 KB PDP-11/70 Minicomputer 1975 1 KB VAX 11/780 Minicomputer 1978 KB IBM 3033 Mainframe 1978 64 KB IBM 3090 Mainframe 1985 18 to 56 KB Intel 80486 PC 1989 8 KB Pentium PC 1993 8 KB/8 KB 56 to 51 KB PowerPC 601 PC 1993 3 KB PowerPC 60 PC 1996 3 KB/3 KB PowerPC G4 PC/server 1999 3 KB/3 KB 56 KB to 1 MB MB IBM S/390 G4 Mainframe 1997 3 KB 56 KB MB IBM S/390 G6 Mainframe 1999 56 KB 8 MB Pentium 4 PC/server 000 8 KB/8 KB 56 KB IBM SP High-end server/ supercomputer 000 64 KB/3 KB 8 MB CRAY MTAb Supercomputer 000 8 KB MB Itanium PC/server 001 KB/ KB 96 KB 4 MB SGI Origin 001 High-end server 001 3 KB/3 KB 4 MB Itanium PC/server 00 3 KB 56 KB 6 MB IBM POWER5 High-end server 003 64 KB 1.9 MB 36 MB CRAY XD-1 Supercomputer 004 64 KB/64 KB 1MB Cache - 3

Main Idea of a Cache - keep a copy of frequently used information as close (w.r.t access time) to the processor as possible. CPU Memory System Bus Steps when the CPU generates a memory request: 1) check the (faster) first ) If the addressed memory value is in the (called a hit), then no need to access memory 3) If the addressed memory value is NOT in the (called a miss), then transfer the block of memory containing the reference to. (The CPU is stalled waiting while this occurs) 4) The supplies the memory value from the. Effective Memory Access Time Suppose that the hit time is 5 ns, the miss penalty is 0 ns, and the hit rate is 99%. Effective Access Time l (hit time * hit probability) + (miss penalty * miss probability) Effective Access Time = 5 * 0.99 + 0 * (1-0.99) = 4.95 + 1.6 = 6.55 ns Cache - 4

One way to reduce the miss penalty is to not have the wait for the whole block to be read from memory before supplying the accessed memory word. Figure 4.5 Cache Read Operation Cache - 5

Fortunately, programs exhibit locality of reference that helps achieve high hit-rates: 1) spatial locality - if a (logical) memory address is referenced, nearby memory addresses will tend to be referenced soon. ) temporal locality - if a memory address is referenced, it will tend to be referenced again soon. Typical Flow of Control in a Program block boundaries "Text" segment "Data" segment a locality of reference a locality of reference for i := end for block 5 block 6 block 7 block 8 Data references within the loop i: block 103 block 10 block 101 block 100 Run-Time Stack Global Data Cache - 6

Cache - Small fast memory between the CPU and RAM/Main memory. Example: 3-bit address 51 KB ( 19 ) 8 byte per block/line byte-addressable memory Number of Cache Line = size of size of line = 19 3 = Three Types of Cache: 1) Direct-mapped - a memory block maps to a single line Line # 0 1 13 Block # 0 1-1 block -1 +1 3-bit address: 13 line # 3 offset Cache - 7

Cache - Small fast memory between the CPU and RAM/Main memory. Example: 3-bit address, byte-addressable memory 51 KB ( 19 ) 8 byte per block/line Number of Cache Line = size of size of line = 19 3 = ) Fully-Associative Cache - a memory block can map to any line Line # 0 1 9 Block # 0 1-1 block -1 +1 3-bit address: 9 3 offset Advane: Flexibility on what s in the Disadvane: Complex circuit to compare all s of the with the in the target address Therefore, they are expensive and slower so use only for small s (say 8-64 lines) Replacement algorithms - on a miss of a full, we must select a block in the to replace LRU - replace the block that has not been used for the longest time (need additional bits) Random - select a block randomly (only slightly worse that LRU) FIFO - select the block that has been in the for the longest time (slightly worse that LRU) Cache - 8

3) Set-Associative Cache - a memory block can map to a small (, 4, or 8) set of lines Common Possibilities: -way set associative - each memory block can map to either of two lines in the 4-way set associative - each memory block can map to either of four lines in the Number of Sets = number of lines size of each set = 4 = = 14 Other Bits 4-way Set Associative Cache way 0 way 1 way way 3 15 15 15 15 Set # 0 1 3 Block # 0 1 block block block block 14-1 14-1 14 14 +1 3-bit address: 15 14 3 set # offset Cache - 9

Figure 4. Varying Associativity over Cache Size tio ra it H 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0.1 0.0 1k direct -way 4-way 8-way -way k 4k 8k k Cache size(bytes) 3k 64k 18k 56k 51k 1M Cache - 10

Block/Line Size The larger the line size: fewer line for the same size improves hit rate since larger blocks are read when a miss occurs larger miss penalty since more words are read from memory when a miss occurs Number of Caches: Issues: Number of levels CPU Memory L1 64KB L 51KB unified vs. split s split s - separate smaller s for data and instructions unified - data and instructions in the same Advanes of each: split s - reduces contention for memory between instruction and data accesses unified s - balances the load between instructions and data automatically (e.g., a tight loop might need more data blocks than instruction blocks) Cache - 11

Types of Cache Misses: Compulsory Misses: misses due to the first access to a block Capacity Misses: misses on blocks that were in the, but were forced out due to the capacity of the. Conflict/Collision Misses: misses due to conflict caused by the direct or set-associated mapping that are eliminated by using fully-associative mapping Studying the performance of a vs. characteristics of the : Cache - 1

Write Policy - do we keep the and memory copy of a block identical??? CPU CPU Memory I/O 5 :X 5 :X 5 :X (can read and write memory) Just reading a shared variable causes no problems - all s have the same value Writing can cause a -coherency problem CPU 0 CPU 1 Memory X:=X +; X:=X +1; 5 7 :X 5 6 :X 5 :X I/O (can read and write memory) Write Policies write back - CPU only changes local copy until that block is replaced, then it is written back to memory (a UPDATE/DIRTY bit is associated with each line to indicate if it has been modified). If we assume that CPU 0 writes the block back to memory before CPU 1, then X s resulting value will be 6. Thereby, discarding the effect of X:=X+. Disadvane(s) of writeback? Advane(s) of writeback? write through - on a write to a block, write to the main memory copy to keep it up to date To avoid stalling the CPU on a write, a write buffer can be used to allow the CPU to continue execution without stalling to wait for the write. write miss options: write allocate - the block written to is read into the before updating no-write allocate - no block is allocated in the and only the lower-level memory is modified Cache - 13

Cache Coherency Solutions a) bus watching with write through / Snoopy s - s eavesdrop on the bus for other s write requests. If the contains a block written by another, it take some action such as invalidating it s copy. CPU CPU Memory I/O 5 :X 5 :X 5 :X (can read and write memory) The MESA protocol is a common -coherency protocol. b) noncachable memory - part of memory is designated as noncachable, so the memory copy is the only copy. Access to this part of memory always generate a miss. CPU CPU Memory noncachable part of memory 5 :X I/O (can read and write memory) c) Hardware transparency - additional hardware is added to update all s and memory at once on a write. Cache - 14

Figure 4.18 Pentium 4 Block Diagram Cache - 15