The University of Adelaide, School of Computer Science 13 September 2018

Similar documents
Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved.

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Adapted from David Patterson s slides on graduate computer architecture

LECTURE 5: MEMORY HIERARCHY DESIGN

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved.

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

CSE 502 Graduate Computer Architecture

Lecture-14 (Memory Hierarchy) CS422-Spring

COSC 6385 Computer Architecture - Memory Hierarchies (III)

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from:

Introduction to cache memories

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

The Memory Hierarchy & Cache

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

COSC 6385 Computer Architecture - Memory Hierarchies (II)

The Memory Hierarchy. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 3, 2018 L13-1

LECTURE 11. Memory Hierarchy

Memory hierarchy Outline

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

The Memory Hierarchy. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Memory Hierarchy and Caches

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

DRAM Main Memory. Dual Inline Memory Module (DIMM)

Memory. Lecture 22 CS301

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

CENG3420 Lecture 08: Memory Organization

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

MEMORY HIERARCHY DESIGN

EEM 486: Computer Architecture. Lecture 9. Memory

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

CSEE W4824 Computer Architecture Fall 2012

ECE468 Computer Organization and Architecture. Memory Hierarchy

Memory Hierarchy: Motivation

The memory gap. 1980: no cache in µproc; level cache on Alpha µproc

CENG4480 Lecture 09: Memory 1

Memory Hierarchy: The motivation

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Memory Hierarchy Design (Appendix B and Chapter 2)

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

ECE 152 Introduction to Computer Architecture

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Lecture 11. Virtual Memory Review: Memory Hierarchy

Advanced Memory Organizations

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Memories. CPE480/CS480/EE480, Spring Hank Dietz.

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EC 513 Computer Architecture

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

EEC 483 Computer Organization

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CPE300: Digital System Architecture and Design

ECE 341. Lecture # 16

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Course Administration

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Recap: Machine Organization

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Page 1. Multilevel Memories (Improving performance using a little cash )

Summary of Computer Architecture

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

COSC4201. Chapter 4 Cache. Prof. Mokhtar Aboelaze York University Based on Notes By Prof. L. Bhuyan UCR And Prof. M. Shaaban RIT

14:332:331. Week 13 Basics of Cache

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Question?! Processor comparison!

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Transcription:

Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per bit than slower memory Solution: organize memory system into a hierarchy Entire addressable memory space available in largest, slowest memory Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor Temporal and spatial locality insures that nearly all references can be found in smaller memories Gives the allusion of a large, fast memory being presented to the processor Memory Hierarchy 2 3 Chapter 2 Instructions: Language of the Computer 1

Memory Performance Gap Memory Hierarchy Design Memory hierarchy design becomes more crucial with recent multi-core processors: Aggregate peak bandwidth grows with # cores: Intel Core i7 can generate two references per core per clock Four cores and 3.2 GHz clock 25.6 billion 64-bit data references/second + 12.8 billion 128-bit instruction references/second = 409.6 GB/s! DRAM bandwidth is only 8% of this (34.1 GB/s) Requires: Multi-port, pipelined caches Two levels of cache per core Shared third-level cache on chip Dynamic RAM One transistor Data stored as charge on a capacitor Leaks need refreshing Bit line Switching element Word line Storage element 4 5 6 Chapter 2 Instructions: Language of the Computer 2

Dynamic RAM m+n n m Row Decoder Static RAM 2 n x 2 m array Sense amplifier +MUX Cross coupled inverters (2 transistor each) + 2 access transistor bit line bit line Row select 7 8 memory Would like a memory that is fast, big and cheap. Hierarchy of memories (multi-level caches, main memory, disk). How to manage the data? Where to put it and who is responsible for moving it? Manual: The programmer does that Automatic: The system does that. 9 Chapter 2 Instructions: Language of the Computer 3

Memory Hierarchy Core L1 Cache L2 Cache L3 Cache RAM 1-4 cycles DISK 10 cycles $I $D 30 cycles 50-100 cycles 10 Performance and Power High-end microprocessors have >10 MB on-chip cache Consumes large amount of area and power budget 11 Locality Temporal Locality When you access a specific address, you will probably access the same address soon Spatial Locality When you access a specific address, nearby addresses will be accessed soon (more for instruction that data) 12 Chapter 2 Instructions: Language of the Computer 4

Performance and Power A Block: The smallest unit of information transferred between two levels. Hit: Item is found in some block in the upper level (example: Block X) Miss: Item needs to be retrieved from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor 13 When a word is not found in the cache, a miss occurs: Fetch word from lower level in hierarchy, requiring a higher latency reference Lower level may be another cache or the main memory When you move a word, get the nearby ones. 14 When a word is not found in the cache, a miss occurs: Fetch word from lower level in hierarchy, requiring a higher latency reference Also fetch the other words contained within the block Takes advantage of spatial locality Place block into cache in any location within its set, determined by address block address MOD number of sets in cache 15 Chapter 2 Instructions: Language of the Computer 5

n sets => n-way set associative Direct-mapped cache => one block per set Fully associative => one set Writing to cache: two strategies Write-through Immediately update lower levels of hierarchy Write-back Only update lower levels of hierarchy when an updated block is replaced Both strategies use write buffer to make writes asynchronous 16 Miss rate Fraction of cache access that result in a miss Causes of misses Compulsory First reference to a block Capacity Blocks discarded and later retrieved Conflict Program makes repeated references to multiple addresses from different blocks that map to the same location in the cache 17 Speculative and multithreaded processors may execute other instructions during a miss Reduces performance impact of misses 18 Chapter 2 Instructions: Language of the Computer 6

Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss rate Increases hit time, increases power consumption Higher associativity Reduces conflict misses Increases hit time, increases power consumption Higher number of cache levels Reduces overall memory access time Giving priority to read misses over writes Reduces miss penalty Avoiding address translation in cache indexing Reduces hit time 19 Dynamic RAM Data stored by charging/discharging a capacitor. One access transistor One capacitor Charges leak, data will be lost in a second Must refresh Cheap row enable Memory Architecture bitline 20 Static RAM Two cross coupled inverters(4 transistors) 2 access transistors Memory Architecture row select bitline _bitline 21 Chapter 2 Instructions: Language of the Computer 7

Memory Technology and Optimizations Performance metrics Latency is concern of cache Bandwidth is concern of multiprocessors and I/O Access time Time between read request and when desired word arrives Cycle time Minimum time between unrelated requests to memory Memory Technology and Optimizations SRAM memory has low latency, use for cache Organize DRAM chips into many banks for high bandwidth, use for main memory 22 Memory Technology SRAM Requires low power to retain bit Requires 6 transistors/bit DRAM Must be re-written after being read Must also be periodically refeshed Every ~ 8 ms (roughly 5% of time) Each row can be refreshed simultaneously One transistor/bit Address lines are multiplexed: Upper half of address: row access strobe (RAS) Lower half of address: column access strobe (CAS) cache Memory Organization Technology and Optimizations 23 -- Placement Direct mapped cache Cache 000 001 010 011 100 101 110 111 00001 00101 01001 01101 10001 10101 11001 11101 Memory 24 Chapter 2 Instructions: Language of the Computer 8

placement -- DM 1K = 1024 Blocks Each block = one word Can cache up to 2 32 bytes = 4 GB of memory Hit Byte offset Byte address 31 30 13 12 11 2 1 0 20 10 Tag Index In de x Valid Tag Data 0 1 Data 2 Mapping function: Cache Block frame number = (Block address) MOD (1024) 1021 1022 1023 20 32 i.e. index field or 10 low bit of block address Block Address = 30 bits Tag = 20 bits Index = 10 bits Block offset = 2 bits 25 Placement -- DM Address (showing bit positions) 31 16 15 4 32 1 0 16 2 Byte 12 Hit Tag o ffset Ind ex 16 bits 128 bits V Tag Data Block offset Data 4K entries 16 32 32 32 32 Mux 32 Block Address = 28 bits Tag = 16 bits Index = 12 bits Block offset = 4 bits 26 Placement -- DM Each block frame in cache has an address tag. The tags of every cache block that might contain the required data are checked in parallel. A valid bit is added to the tag to indicate whether this entry contains a valid address. The address from the CPU to cache is divided into: A block address, further divided into: An index field to choose a block set in cache. (no index field when fully associative). A tag field to search and match addresses in the selected set. A block offset to select the data from the block. Tag Block Address Index Block Offset 27 Chapter 2 Instructions: Language of the Computer 9

Placement -- DM Physical Memory Address Generated by CPU Tag Block Address Index Block Offset Block offset size = log2(block size) Index size = log2(total number of blocks/associativity) Tag size = address size - index size - offset size Mapping function: Cache set or block frame number = Index = Number of Sets = (Block Address) MOD (Number of Sets) 28 Set Associative 4KB 4-way 1024 block frames Each block = one word 4-way set associative 1024 / 4= 256 sets Can cache up to 2 32 bytes = 4 GB of memory Index 0 1 2 253 254 255 V Tag Address 31 30 12 11 10 9 8 3 2 1 0 22 8 Data V Tag Data V Tag Data V Tag Data 22 32 Block Address = 30 bits Tag = 22 bits Index = 8 bits Block offset = 2 bits 4-to-1 m ultiplexor Mapping Function: Cache Set Number = index= (Block address) MOD (256) Hit Data 29 Placement -- DM 30 Chapter 2 Instructions: Language of the Computer 10