Advanced Computer Architecture

Similar documents
Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Handout 4 Memory Hierarchy

Page 1. Multilevel Memories (Improving performance using a little cash )

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

Modern Computer Architecture

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

CSE 502 Graduate Computer Architecture. Lec 6-7 Memory Hierarchy Review

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

CPE 631 Lecture 04: CPU Caches

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

ECE468 Computer Organization and Architecture. Virtual Memory

ECE4680 Computer Organization and Architecture. Virtual Memory

CSE 502 Graduate Computer Architecture. Lec 5-6 Memory Hierarchy Review

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from:

CSE 502 Graduate Computer Architecture. Lec 7-10 App B: Memory Hierarchy Review

Memory Hierarchy Review

Review from last lecture. EECS 252 Graduate Computer Architecture. Lec 4 Memory Hierarchy Review. Outline. Example Standard Deviation: Last time

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Lecture-14 (Memory Hierarchy) CS422-Spring

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Question?! Processor comparison!

Lecture 11 Cache. Peng Liu.

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 7 - Memory Hierarchy-II

EE 4683/5683: COMPUTER ARCHITECTURE

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

Computer Architecture Spring 2016

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

Page 1. Memory Hierarchies (Part 2)

Computer Systems Architecture

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Computer Architecture ELEC3441

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSC Memory System. A. A Hierarchy and Driving Forces

Course Administration

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

COMP 3221: Microprocessors and Embedded Systems

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

ECE ECE4680

Advanced Memory Organizations

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CMPSC 311- Introduction to Systems Programming Module: Caching

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

CS3350B Computer Architecture

TDT 4260 lecture 3 spring semester 2015

Block Size Tradeoff (1/3) Benefits of Larger Block Size. Lecture #22 Caches II Block Size Tradeoff (3/3) Block Size Tradeoff (2/3)

CSE 2021: Computer Organization

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

More and faster A.W. Burks, H.H. Goldstine A.Von Neumann

CMPSC 311- Introduction to Systems Programming Module: Caching

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Assignment 1 due Mon (Feb 4pm

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

CSE 2021: Computer Organization

Logical Diagram of a Set-associative Cache Accessing a Cache

Memory Hierarchy: The motivation

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

14:332:331. Week 13 Basics of Cache

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Introduction. Memory Hierarchy

UCB CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

Memory Hierarchy: Motivation

Recap: Machine Organization

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

1/19/2009. Data Locality. Exploiting Locality: Caches

14:332:331. Week 13 Basics of Cache

UCB CS61C : Machine Structures

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

Transcription:

ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010

Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could execute 800 instructions during time for one memory access! Performance (1/latency) CPU CPU 60% per yr 2X in 1.5 yrs Gap grew 50% per year DRAM 9% per yr DRAM 2X in 10 yrs Year 563 L03.2 Fall 2010

Addressing the Processor-Memory Performance GAP Goal: Illusion of large, fast, cheap memory. Let programs address a memory space that scales to the disk size, at a speed that is usually as fast as register access Solution: Put smaller, faster cache memories between CPU and DRAM. Create a memory hierarchy. 563 L03.3 Fall 2010

Levels of the Memory Hierarchy Capacity Access Time Cost CPU Registers 100s Bytes <10s ns Cache K Bytes 10-100 ns 1-0.1 cents/bit Main Memory M Bytes 200ns- 500ns $.0001-.00001 cents /bit Disk G Bytes, 10 ms (10,000,000 ns) -5-6 10-10 cents/bit Tape infinite sec-min 10-8 Today s Focus Registers Cache Memory Disk Tape Instr. Operands Blocks Pages Files Staging Xfer Unit prog./compiler 1-8 bytes cache cntl 8-128 bytes OS 512-4K bytes user/operator Mbytes Upper Level faster Larger Lower Level 563 L03.4 Fall 2010

Common Predictable Patterns Two predictable properties of memory references: Temporal Locality: If a location is referenced, it is likely to be referenced again in the near future (e.g., loops, reuse). Spatial Locality: If a location is referenced it is likely that locations near it will be referenced in the near future (e.g., straightline code, array access). 563 L03.5 Fall 2010

Memory Reference Patterns Memory Address (one dot per access) Bad locality behavior Spatial Locality Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): 168-192 (1971) 563 L03.6 Fall 2010 Temporal Locality Time

Caches Caches exploit both types of predictability: Exploit temporal locality by remembering the contents of recently accessed locations. Exploit spatial locality by fetching blocks of data around recently accessed locations. 563 L03.7 Fall 2010

Cache Algorithm (Read) Look at Processor Address, search cache tags to find match. Then either HIT - Found in Cache Return copy of data from cache Hit Rate = fraction of accesses found in cache Miss Rate = 1 Hit rate Hit Time = RAM access time + time to determine HIT/MISS Miss Time = time to replace block in cache + time to deliver block to processor MISS - Not in cache Read block of data from Main Memory Wait Return data to processor and update cache What is the average cache access time? 563 L03.8 Fall 2010

Inside a Cache Address Address Processor Data CACHE Data Main Memory copy of main memory location 100 copy of main memory location 101 100 304 Data Byte Data Byte Data Byte Line Address Tag 6848 416 Data Block 563 L03.9 Fall 2010

4 Questions for Memory Hierarchy Q1: Where can a block be placed in the cache? (Block placement) Q2: How is a block found if it is in the cache? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy) 563 L03.10 Fall 2010

Q1: Where can a block be placed? Block Number 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 2 2 2 2 2 2 2 2 2 2 0 1 2 3 4 5 6 7 8 9 3 3 0 1 Memory Set Number 0 0 1 2 3 0 1 2 3 4 5 6 7 Cache Block 12 can be placed Fully Associative anywhere (2-way) Set Direct Associative Mapped anywhere in only into set 0 block 4 (12 mod 4) (12 mod 8) 563 L03.11 Fall 2010

Q2: How is a block found? Index selects which set to look in Tag on each block No need to check index or block offset Increasing associativity shrinks index, expands tag. Fully Associative caches have no index field. Memory Address Block Address Tag Index Block Offset 563 L03.12 Fall 2010

Direct-Mapped Cache Tag Index Block Offset t V Tag k Data Block b 2 k lines = t HIT Data Word or Byte 563 L03.13 Fall 2010

2-Way Set-Associative Cache Tag Index Block Offset b t V Tag k Data Block V Tag Data Block t = = Data Word or Byte HIT 563 L03.14 Fall 2010

Fully Associative Cache V Tag Data Block = t Block Offset Tag b t = = Data Word or Byte HIT 563 L03.15 Fall 2010

What causes a MISS? Three Major Categories of Cache Misses: Compulsory Misses: first access to a block Capacity Misses: cache cannot contain all blocks needed to execute the program Conflict Misses: block replaced by another block and then later retrieved - (affects set assoc. or direct mapped caches) - Nightmare Scenario: ping pong effect! 563 L03.16 Fall 2010

Block Size and Spatial Locality Block is unit of transfer between the cache and memory Tag Word0 Word1 Word2 Word3 4 word block, b=2 Split CPU address block address offset b 32-b bits b bits 2 b = block size a.k.a line size (in bytes) Larger block size has distinct hardware advantages less tag overhead exploit fast burst transfers from DRAM exploit fast burst transfers over wide busses What are the disadvantages of increasing block size? Fewer blocks => more conflicts. Can waste bandwidth. 563 L03.17 Fall 2010

Q3: Which block should be replaced on a miss? Easy for Direct Mapped Set Associative or Fully Associative: Random Least Recently Used (LRU) - LRU cache state must be updated on every access - true implementation only feasible for small sets (2-way) - pseudo-lru binary tree often used for 4-8 way First In, First Out (FIFO) a.k.a. Round-Robin - used in highly associative caches Replacement policy has a second order effect since replacement only happens on misses 563 L03.18 Fall 2010

Q4: What happens on a write? Cache hit: write through: write both cache & memory - generally higher traffic but simplifies cache coherence write back: write cache only (memory is written only when the entry is evicted) - a dirty bit per block can further reduce the traffic Cache miss: no write allocate: only write to main memory write allocate (aka fetch on write): fetch into cache Common combinations: write through and no write allocate write back with write allocate 563 L03.19 Fall 2010

5 Basic Cache Optimizations Reducing Miss Rate 1. Larger Block size (compulsory misses) 2. Larger Cache size (capacity misses) 3. Higher Associativity (conflict misses) Reducing Miss Penalty 1. Multilevel Caches Reducing hit time 5. Giving Reads Priority over Writes E.g., Read complete before earlier writes in write buffer 563 L03.20 Fall 2010