The memory gap. 1980: no cache in µproc; level cache on Alpha µproc

Similar documents
Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Spring 1996

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Caching Basics. Memory Hierarchies

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

The University of Adelaide, School of Computer Science 13 September 2018

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Page 1. Multilevel Memories (Improving performance using a little cash )

CS422 Computer Architecture

Lecture-14 (Memory Hierarchy) CS422-Spring

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CS3350B Computer Architecture

Lecture 11 Reducing Cache Misses. Computer Architectures S

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Speicherarchitektur. Who Cares About the Memory Hierarchy? Technologie-Trends. Speicher-Hierarchie. Referenz-Lokalität. Caches

Memory Hierarchy and Caches

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Introduction to OpenMP. Lecture 10: Caches

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Course Administration

Memory Hierarchy. Slides contents from:

EE 4683/5683: COMPUTER ARCHITECTURE

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Memory Hierarchy. Slides contents from:

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

CPE 631 Lecture 04: CPU Caches

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Cray XE6 Performance Workshop

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

14:332:331. Week 13 Basics of Cache

Memory Hierarchy: Caches, Virtual Memory

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Improving Cache Performance. Dr. Yitzhak Birk Electrical Engineering Department, Technion

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Lecture 11. Virtual Memory Review: Memory Hierarchy

14:332:331. Week 13 Basics of Cache

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Cache Memory: Instruction Cache, HW/SW Interaction. Admin

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CMPSC 311- Introduction to Systems Programming Module: Caching

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Memory. Lecture 22 CS301

Chapter 5. Memory Technology

DECstation 5000 Miss Rates. Cache Performance Measures. Example. Cache Performance Improvements. Types of Cache Misses. Cache Performance Equations

The Memory Hierarchy & Cache

Page 1. Memory Hierarchies (Part 2)

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Modern Computer Architecture

CS161 Design and Architecture of Computer Systems. Cache $$$$$

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

CS61C : Machine Structures

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses

CMPSC 311- Introduction to Systems Programming Module: Caching

www-inst.eecs.berkeley.edu/~cs61c/

Cache performance Outline

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

CMPT 300 Introduction to Operating Systems

Caches. Hiding Memory Access Times

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

ECE468 Computer Organization and Architecture. Memory Hierarchy

Memory Hierarchy: The motivation

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

UCB CS61C : Machine Structures

EEC 483 Computer Organization

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Assignment 1 due Mon (Feb 4pm

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Transcription:

The memory gap 1980: no cache in µproc; 1995 2-level cache on Alpha 21164 µproc

Memory Technology Review DRAM: value is stored as a charge on capacitor (must be refreshed,ras/cas) very small but slower than SRAM (factor of 5 to 10) Word line Pass transistor Capacitor SRAM: Bit line value is stored on a pair of inverting gates (e.g. D-Latch) very fast but takes up more space than DRAM (4 to 6 transistors)

General Principles Locality Temporal Locality: referenced again soon Spatial Locality: nearby items referenced soon Locality + smaller mem is faster = memory hierarchy Levels: each smaller, faster, more expensive/byte than level below Inclusive: data found in top also found in the bottom Definitions Upper is closer to processor Block: minimum unit that present or not in upper level Address = Block frame address + block offset address Hit time: time to access upper level, including hit determination Why does code have locality?

Locality Temporal locality Recently used item is likely to be re-used in near future Spatial locality Addresses close together physically tend to be referenced close together in time 90/10 Locality Rule: Code executes 90% of its instructions in 10% of its code.

The memory hierarchy Users want large, fast memories cheap! (conflict) SRAM speed/capacity Lower power, faster DRAM cost/capacity Refreshing, less expensive

The memory hierarchy CPU Level 1 Increasing distance from the CPU in access time Levels in the memory hierarchy Level 2 Level n Size of the memory at each level

Who s in control? cost Registers Cache Memory Storage size compiler hardware OS OS/user

Q1: Where can a block be placed in the upper level?

Q2: How Is a Block Found If It Is in the Upper Level? Tag on each block No need to check index or block offset Increasing associativity shrinks index, expands Given a block address, it can only be found in the set specified in index. All s in index set must be compared to block address (in parallel) to find a hit. Block Frame Address Tag Index Block Offset FA: No index Find block in set Select set DM: Large index Block address compared to block frame address () for all frames in cache in parallel

Block address index index index index index index index index Q2: How Is a Block Found If It Is in the Upper Level? FA: No index DM: Large index Block Frame Address Tag Index index index index index Given a block address, it can only be found in the set specified in index. All s in index set must be compared to block address (in parallel) to find a hit. Block Offset

Cache Performance cpu execution time (cpuclock cycles mem stall cycles)*clock cycle time mem stall cycles (# misses* miss penalty) # misses IC * *miss penalty instruction memory accesses IC * * miss rate* miss penalty instruction

Split cache 64% non-data transfers 36% data transfers Single ported unified cache split cache miss rate Example 1 # misses/#instr # mem accesses/# instr # misses # mem accesses.00382 1 mem accesses/1 instr: imiss rate16kb. 00382 1.0409.36 mem accesses/1 instr: dmiss rate16kb. 1136.36 miss rate Misses per instruction Size Icache Dcache Ucache 16KB.00382.0409.0510 32KB.00136.0384.0433 16KB 16KB Unified cache Data instr All other instr Hit time 2 1 Miss time 100 100.74(.00382).26(.1136).0324 Instruction stream 74% instr refs 26% data refs Unified cache.0433 1+.36 data accesses/1 instr: umiss rate32kb.0318 1.36 Split cache miss rate > Unified cache miss rate

64% non-data transfers 36% data transfers Example 1 Instruction stream Misses per instruction Size Icache Dcache Ucache 16KB.00382.0409.0510 32KB.00136.0384.0433 Unified cache Data instr All other instr Hit time 2 1 Miss time 100 100 74% instr refs 26% data refs AMAT hit time miss rate*miss penalty AMAT % I( AMAT I ) % D( AMAT D ) AMATsplit.74(1.00382*100).26(1.1136*100) 4.24 AMATunified.74(1.0318*100).26(2.0318*100) 4.44 Split cache AMAT < Unified cache AMAT [opposite of cache miss rate result (split > unified)]

A Closer Look at Misses Classifying Misses: 3 Cs Compulsory The first access to a block is not in the cache, so the block must be brought into the cache. These are also called cold start misses or first reference misses. (Misses in Infinite Cache) Capacity If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in Size X Cache) Conflict If the block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory and capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. These are also called collision misses or interference misses. (Misses in N-way Associative, Size X Cache) CPU Cache

1 2 4 8 16 32 64 128 3Cs Absolute Miss Rate 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 1-way 2-way 4-way 8-way Conflict Capacity Cache Size (KB) Compulsory

Block Size vs. Cache Measures Increasing Block Size generally increases Miss Penalty and decreases Miss Rate Miss Penalty X Miss = Rate Avg. Memory Access Time Block Size Block Size Block Size Block Size Block Size