UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Similar documents
Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Modern Computer Architecture

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

COSC 6385 Computer Architecture - Memory Hierarchies (I)

ארכיטקטורת יחידת עיבוד מרכזי ת

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

CPE 631 Lecture 04: CPU Caches

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Computer Architecture Spring 2016

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

ECE232: Hardware Organization and Design

Page 1. Memory Hierarchies (Part 2)

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

ECE331: Hardware Organization and Design

Virtual Memory, Address Translation

Virtual Memory, Address Translation

UC Berkeley CS61C : Machine Structures

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

EE 4683/5683: COMPUTER ARCHITECTURE

CS3350B Computer Architecture

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Course Administration

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

Virtual Memory: From Address Translation to Demand Paging

ECE232: Hardware Organization and Design

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Computer Science 146. Computer Architecture

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Lecture 11. Virtual Memory Review: Memory Hierarchy

Virtual Memory. Motivation:

Virtual Memory - Objectives

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Virtual Memory. Virtual Memory

ECE331: Hardware Organization and Design

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Virtual Memory. Samira Khan Apr 27, 2017

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Handout 4 Memory Hierarchy

UCB CS61C : Machine Structures

Advanced Computer Architecture

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

V. Primary & Secondary Memory!

ECE468 Computer Organization and Architecture. Virtual Memory

ECE4680 Computer Organization and Architecture. Virtual Memory

Main Memory (Fig. 7.13) Main Memory

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

ECE232: Hardware Organization and Design

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

UC Berkeley CS61C : Machine Structures

ECE 30 Introduction to Computer Engineering

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

1. Creates the illusion of an address space much larger than the physical memory

CS162 Operating Systems and Systems Programming Lecture 13. Caches and TLBs. Page 1

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

ADDRESS TRANSLATION AND TLB

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Computer Architecture Lecture 13: Virtual Memory II

Advanced Memory Organizations

CS152: Computer Architecture and Engineering Caches and Virtual Memory. October 31, 1997 Dave Patterson (http.cs.berkeley.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

Time. Recap: Who Cares About the Memory Hierarchy? Performance. Processor-DRAM Memory Gap (latency)

ADDRESS TRANSLATION AND TLB

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Virtual to physical address translation

Virtual Memory. Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner

Introduction to cache memories

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

ECE331: Hardware Organization and Design

DAT (cont d) Assume a page size of 256 bytes. physical addresses. Note: Virtual address (page #) is not stored, but is used as an index into the table

10/7/13! Anthony D. Joseph and John Canny CS162 UCB Fall 2013! " (0xE0)" " " " (0x70)" " (0x50)"

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Memory Hierarchy Review

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

LECTURE 11. Memory Hierarchy

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Transcription:

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part Hierarchy - I Israel Koren ECE568/Koren Part.. 3 4 5 6 7 8 9 A B C D E F 6 blocks 3 4 block frames Physical Address Penalty? Purpose of location CPU Bridge CPU- speed gap access 3 to cycles access to cycles Block size = line size = 8 to 8 Bytes Block Number Block Table Block Frame Number Block Offset Block Offset Must locate the data in the cache and get it in one cycle! Use some variation of Content Addressable (CAM) Block Data ECE568/Koren Part.. Copyright Koren UMass 6 Page

Place block anywhere in cache Most flexible Most expensive Fully Associative tag memory 4 9., 3 bits Valid bits.. data memory block block. 55 block 55 One cache line, 8 bytes Main MM block MM block MM block. MM block 9 MM block 4 ECE568/Koren Part..3 Main memory address 3 3 Offset MM block 89 One cache line, 8 bytes Heuring & Adapted Jordan, from Computer UCB and other Systems sourcesdesign & Architecture, Copyright AW, UCB 997 & Morgan Kaufmann Fully Associative Operation Data 4 9 3 _hit 4 Offset 5 TAG=9 6 ECE568/Koren Part..4 Adapted Heuring from & UCB Jordan, and other Computer sources Systems Design & Architecture, AW, 997 Page

Address ECE568/Koren Part..5 Simplest : Direct Mapped Index 4 Byte Direct Mapped Location in cache can be occupied by data from memory location, 4, 8,... In general: any memory location whose LSBs are Address<:> => cache index Which one should we place in cache? How can we tell which one is in cache? Block size = Byte Direct Mapped t k b- k Hit ECE568/Koren Part..6 Page 3

Direct-Mapped (Block size = 8 Bytes) memory 3 9 Valid bits memory Main memory block numbers 56 5 768 7936 57 53 35 768 7937 58 54 768 7938 Index field, 5 bits 55 One cache line, 8 bytes # 55 5 767 89.. 9.. 3 3 address 8 3 One cache line, 8 bytes 55 Main memory address 5 8 3 Index Offset ECE568/Koren Part..7 Heuring Adapted from & Jordan, UCB and Computer other sourcessystems Design & Architecture, AW, 997 Direct-Mapped Operation Index Offset Only comparator Index= 3 5 4 6 ECE568/Koren Part..8 Heuring & Jordan, Computer Systems Design & Architecture, AW, 997 Page 4

Compromise: n-way Set Associative n entries for each Index A block can be placed in one out of n frames (n power of ) Direct mapping is -way set associative Fully associative is j-way set associative (j is no. of block frames in cache) Example: Two-way set associative cache ECE568/Koren Part..9 memory 3 9 field, 5 bits 55 memory 5 768 53 35 58 55 5 One cache line, 8 bytes 56 5 768 7936 57 53 35 768 7937 58 54 768 7938 address Main memory address Main memory block numbers Index 55 5 767 89 55 #.. 9.. 3 3 8 3 5 8 3 One cache line, 8 bytes Index = Set no. Index Offset Adapted Heuring from & UCB Jordan, and other Computer sources Systems Design & Copyright Architecture, UCB & Morgan AW, Kaufmann 997 Two-way Set Associative Two direct-mapped caches operate in parallel Index selects a set from the cache (set includes blocks) The two tags in the set are compared in parallel Data is selected based on the tag result Valid Data Block : : : Index Data Block : Valid : : Compare Sel Mux Sel Compare Set Hit OR Block ECE568/Koren Part.. Page 5

Disadvantage of Set Associative N-way Set Associative v. Direct Mapped : N comparators vs. Extra MUX delay for the data Data comes AFTER Hit/Miss In a direct mapped cache, Block is available BEFORE Hit/Miss: Possible to assume a hit and continue. Recover later if miss. Index Valid Data Block Data Block : : : : Valid : : Compare Sel Mux Sel Compare Set ECE568/Koren Part.. OR Hit Block Q: Where can a block be placed in the upper level? Block # placed in a cache with 8 block frames: Fully associative, direct mapped, -way set associative SetAssoc. Mapping = (Block_Number) Modulo (Number_of_Sets) Fully Direct Mapped -Way Assoc Associative ( mod 8) = 4 ( mod 4) = 34567 34567 3 33 345678934567893456789 ECE568/Koren Part.. Page 6

Q: How is a block found if it is in the upper level? for each block Increasing associativity shrinks index expands tag Block Address Index Block Offset { Set No. ECE568/Koren Part..3 Q3: Which block should be replaced on a miss? Easy for Direct Mapped Set Associative or Fully Associative: Random LRU (Least Recently Used) only approximated Assoc: -way 4-way 8-way Size LRU Ran LRU Ran LRU Ran 6 KB 5.% 5.7% 4.7% 5.3% 4.4% 5.% 64 KB.9%.%.5%.7%.4%.5% 56 KB.5%.7%.3%.3%.%.% ECE568/Koren Part..4 Page 7

Q4: What happens on a write? (Hit) Write in cache - should modify only part of block Write through The information is written to both the block in the cache and to the block in the memory Write back The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced Pros and Cons of each WT: Can always discard cached data - up-to-date data is in memory, while WB: Can t discard it - may have to write it back WT: need only a valid bit, while WB: need valid and dirty bits WT: Read misses (in cache) do not result in writes (in memory); (or other processors) always have latest data; Simpler management of cache WB: Much lower bandwidth, since data often overwritten multiple times; Better tolerance to long-latency memory ECE568/Koren Part..5 Write Buffer for Write Through Processor DRAM Write Buffer A Write Buffer is added between the and Processor: writes data into the cache and the write buffer and may continue w/o stalls controller: write contents of the buffer to memory Write buffer is just a FIFO: Typical number of entries: 4 Works fine if: Store frequency << / DRAM_write_cycle ECE568/Koren Part..6 Page 8

What happens on write-miss Write allocate: allocate new cache line in cache Usually means that you have to do a read miss to fill in rest of the cache-line! Write non-allocate (or write-around ): Simply send write data through to underlying memory/(level cache) - don t allocate new cache line ECE568/Koren Part..7 Three levels of Hierarchy CPU VA PA miss Translation Main Disk data hit CPU generates virtual rather than physical addresses VA (disk) -> PA (memory) -> cache VA to PA translation commonly done by TLB (Translation Lookaside Buffer ) Really just a cache on the page table mappings TLB access time comparable to cache access time (much less than main memory access time) ECE568/Koren Part..8 Page 9

Translation Look-Aside Buffers Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped TLBs are usually small, typically not more than 8-56 entries even on high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations. CPU hit VA TLB PA miss miss T Main Disk Translation with a TLB Page Table hit T T? data ECE568/Koren Part..9 TLB in IBM 333 -way set-assoc. 8 entries =64* Page size = 4K size = 3M =^5 TLB Hit ECE568/Koren Part.. Adapted J. Feldman from UCB & and C. other Retter, sources Computer Architecture, McGraw-Hill, 994 Page

Virtual Address and a CPU VA PA miss TLB data hit Main It takes an extra TLB access to translate VA to PA making cache access expensive Solution : Access cache with VA synonym / alias problem: two different VAs map to same PA => two different cache entries holding data for the same physical address! for update: must update all cache entries with same PA or memory becomes inconsistent determining this requires significant hardware Two processes map the same VA to two different PAs with only one entry in cache - solution: flush cache upon context switch ECE568/Koren Part.. Solution : Overlapped & TLB Access The page offset bits are available (will not change); Use high order bits of VA for TLB access while low order bits index the cache 3 PPN TLB TLB Hit/ Miss assoc lookup index VP # offset Virtual address = 4 bytes Data Hit/ Miss V K IF (V=) AND (cache tag = PPN) then deliver data to CPU ECE568/Koren Part.. Page

Drawback of Overlapped TLB Access Small cache as it is limited by page size Increase page size, or Use n-way set associative cache _size = n Page_size Example: use -way set associative cache for an 8K byte total cache size 4 4 K -way set associative cache _size = 4K = 8K Bytes ECE568/Koren Part..3 & TLB (modified) in IBM 333 Page = 4KByte ( way)= 56 x 6 =4KB 6-way setassociative =64KByte Block=6 Bytes =3MB =^5 TLB TLB Hit Hit ECE568/Koren Part..4 Adapted J. Feldman from UCB & and C. other Retter, sources Computer Architecture, McGraw-Hill, 994 Page