Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

Similar documents
Advanced Computer Architecture

Memory Hierarchy. Slides contents from:

Page 1. Multilevel Memories (Improving performance using a little cash )

Memory Hierarchy. Slides contents from:

Lecture-14 (Memory Hierarchy) CS422-Spring

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Lecture 11 Cache. Peng Liu.

Computer Architecture ELEC3441

Lecture 6 - Memory. Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

EE 4683/5683: COMPUTER ARCHITECTURE

EE 660: Computer Architecture Advanced Caches

Caching Basics. Memory Hierarchies

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Course Administration

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Introduction to cache memories

EC 513 Computer Architecture

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

CMPSC 311- Introduction to Systems Programming Module: Caching

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Topics. Digital Systems Architecture EECE EECE Need More Cache?

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

CSC 631: High-Performance Computer Architecture

Logical Diagram of a Set-associative Cache Accessing a Cache

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory. Last =me in Lecture 5

Introduction. Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

EN1640: Design of Computing Systems Topic 06: Memory System

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

Computer Architecture Memory hierarchies and caches

Question?! Processor comparison!

Review: Computer Organization

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

EN1640: Design of Computing Systems Topic 06: Memory System

1/19/2009. Data Locality. Exploiting Locality: Caches

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches

10/11/17. New-School Machine Structures. Review: Single Cycle Instruction Timing. Review: Single-Cycle RISC-V RV32I Datapath. Components of a Computer

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches. Hiding Memory Access Times

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT

Memory Hierarchy: Caches, Virtual Memory

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Advanced Memory Organizations

CMPSC 311- Introduction to Systems Programming Module: Caching

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

CS 61C: Great Ideas in Computer Architecture Caches Part 2

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Memory hierarchy review. ECE 154B Dmitri Strukov

CSE502: Computer Architecture CSE 502: Computer Architecture

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

TDT 4260 lecture 3 spring semester 2015

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Memory Hierarchy Review

Introduction to OpenMP. Lecture 10: Caches

Key Point. What are Cache lines

MIPS) ( MUX

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

ECE331: Hardware Organization and Design

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

The Memory Hierarchy & Cache

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Page 1. Memory Hierarchies (Part 2)

Transcription:

EE 260: Introduction to Digital Design Technology Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa 2 Technology Naive Register File Write Read clk Decoder Read Write 3 4 Arrays: Register File Arrays: SRAM 5 6 Chapter 15: 1

Arrays: DRAM Relative Sizes of SRAM vs. DRAM On-Chip SRAM on logic chip DRAM on memory chip 7 [ From Foss, R.C. Implementing Application- Specific, ISSCC 1996 ] 8 Technology Trade-offs Latches/Registers Register File SRAM DRAM Low Capacity Low Latency High Bandwidth (more and wider ports) High Capacity High Latency Low Bandwidth Technology 9 10 CPU- Bottleneck -DRAM Latency Gap Performance of high-speed computers is usually limited by memory bandwidth and latency Latency is time for a single access memory latency is usually >> than processor cycle time Bandwidth is the number of accesses per unit time If m instructions are loads/stores, 1 + m memory accesses per instruction, CPI = 1 requires at least 1 + m memory accesses per cycle Bandwidth-Delay Product is amount of data that can be in flight at the same time (Little s Law) 11 [Hennessy & Patterson 2011] Four-issue 2 GHz superscalar accessing 100 ns DRAM could execute 800 instructions during the time for one memory access! Long latencies mean large bandwidth-delay products which can be difficult to saturate, meaning bandwidth is wasted From Hennessy and Patterson Ed. 5 Image Copyright 2011, Elsevier Inc. All rights Rese 1 r 2 ved. Chapter 15: 2

Physical Size Affects Latency Hierarchy Small Fast (RF, SRAM) Big Slow (DRAM) Small Signals have further to travel Fan out to more locations Big 13 Capacity: Register << SRAM << DRAM Latency: Register << SRAM << DRAM Bandwidth: on-chip >> off-chip On a data access: if data is in fast memory -> low-latency access to SRAM if data is not in fast memory -> long-latency access to DRAM hierarchies only work if the small, fast memory actually stores data that is reused by the processor 14 Common And Predictable Reference Patterns Instruction fetches Stack accesses accesses subroutine call n loop iterations argument access scalar accesses subroutine return Time Temporal Locality: If a location is reference it is likely to be reference again in the near future Spatial Locality: If a location is referenced it is likely that locations near it will be referenced in the near future 15 Real Reference Patterns Spatial Locality Temporal & Spatial Locality Time (one dot per access to that address at that time) [From Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual. IBM Systems Journal 10(3): 168-192 (1971)] Temporal Locality 16 Caches Exploit Both Types of Locality Small Fast (RF, SRAM) Big Slow (DRAM) Exploit temporal locality by remembering the contents of recently accessed locations Exploit spatial locality by fetching blocks of data around recently accessed locations Technology 17 18 Chapter 15: 3

Inside a Cache Basic Cache Algorithm for a Load CACHE copy of main memory location 100 copy of main memory location 101 Tag 100 Byte 304 Byte 6848 416 Byte Line Block 19 20 Classifying Caches CACHE Block Placement: Where can a block be placed in the cache? Block Identification: How a block is found if it is in the cache? Block Replacement: Which block should be replaced on a miss? Write Strategy: What happens on a write? 21 Block Placement: Where Place Block in Cache? Block Number Set Number Cache block 12 can be placed 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Fully 0 1 2 3 (2-way) Set 0 1 2 3 4 5 6 7 Direct Mapped 22 Block Placement: Where Place Block in Cache? Block Number 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Block Identification: How to find block in cache? Set Number 0 1 2 3 0 1 2 3 4 5 6 7 Cache block 12 can be placed Fully anywhere (2-way) Set anywhere in set 0 (12 mod 4) Direct Mapped only into block 4 (12 mod 8) 23 Cache uses index and offset to find potential match, then checks tag Tag check only includes higher order bits In this example (Direct-mapped, 8B block, 4 line cache ) 24 Chapter 15: 4

Block Identification: How to find block in cache? Cache checks all potential blocks with parallel tag check In this example (2-way associative, 8B block, Block Replacement: Which block to replace? No choice in a direct mapped cache In an associative cache, which block from set should be evicted when the set becomes full? Random Least Recently Used (LRU) LRU cache state must be updated on every access True implementation only feasible for small sets (2-way) Pseudo-LRU binary tree often used for 4-8 way First In, First Out (FIFO) aka Round-Robin Used in highly associative caches Not Most Recently Used (NMRU) FIFO with exception for most recently used block(s) 4 line cache) 25 26 Write Strategy: How are writes handled? Cache Hit Write Through write both cache and memory, generally higher traffic but simpler to design Write Back write cache only, memory is written when evicted, dirty bit per block avoids unnecessary write backs, more complicated Cache Miss No Write Allocate only write to main memory Write Allocate fetch block into cache, then write Common Combinations Write Through & No Write Allocate Write Back & Write Allocate 27 Technology 28 Average Access Time Categorizing Misses: The Three C s Hit Miss CACHE Average Access Time = Hit Time + ( Miss Rate * Miss Penalty ) 29 Compulsory first-reference to a block, occur even with infinite cache Capacity cache is too small to hold all data needed by program, occur even under perfect replacement policy (loop over 5 cache lines) Conflict misses that occur because of collisions due to less than full associativity (loop over 3 cache lines) 30 Chapter 15: 5

Reduce Hit Time: Small & Simple Caches Reduce Miss Rate: Large Block Size Less tag overhead Exploit fast burst transfers from DRAM Exploit fast burst transfers over wide on-chip busses Can waste bandwidth if data is not used Fewer blocks -> more conflicts Plot from Hennessy and Patterson Ed. 4 Image Copyright 2007-2012 Elsevier Inc. All rights Reserved. Reduce Miss Rate: Large Cache Size Reduce Miss Rate: High Associativity Empirical Rule of Thumb: If cache size is doubled, miss rate usually drops by about 2 Empirical Rule of Thumb: Direct-mapped cache of size N has about the same miss rate as a two-way set- associative cache of size N/2 Reduce Miss Rate: High Associativity Empirical Rule of Thumb: Direct-mapped cache of size N has about the same miss rate as a two-way set- associative cache of size N/2 Chapter 15: 6