data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

Similar documents
Advanced Memory Organizations

EEC 483 Computer Organization

ECE 30 Introduction to Computer Engineering

Review: Computer Organization

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

ECE331: Hardware Organization and Design

EEC 483 Computer Organization. Chapter 5.3 Measuring and Improving Cache Performance. Chansu Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

ECE331: Hardware Organization and Design

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Computer Architecture CS372 Exam 3

CSE 2021: Computer Organization

CSE 2021: Computer Organization

Page 1. Memory Hierarchies (Part 2)

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

ECEN/CSCI 4593 Computer Organization and Design Exam-2

ECE232: Hardware Organization and Design

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS3350B Computer Architecture

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

ECE331: Hardware Organization and Design

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

ארכיטקטורת יחידת עיבוד מרכזי ת

V. Primary & Secondary Memory!

14:332:331. Week 13 Basics of Cache

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

ISA Instruction Operation

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Main Memory Supporting Caches

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Computer Systems Laboratory Sungkyunkwan University

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

ECE 2300 Digital Logic & Computer Organization. More Caches Measuring Performance

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Computer Architecture Spring 2016

CPE 631 Lecture 04: CPU Caches

A Cache Hierarchy in a Computer System

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Page 1. Multilevel Memories (Improving performance using a little cash )

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LECTURE 11. Memory Hierarchy

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

Modern Computer Architecture

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

CS 465 Final Review. Fall 2017 Prof. Daniel Menasce

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

COSC 6385 Computer Architecture - Memory Hierarchies (I)

ECE 2300 Digital Logic & Computer Organization. More Caches

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Memory Hierarchy: Motivation

Lecture 11. Virtual Memory Review: Memory Hierarchy

Memory Hierarchy: The motivation

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Lecture 11 Cache. Peng Liu.

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Improving Cache Performance

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

CS 61C: Great Ideas in Computer Architecture Caches Part 2

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

Caches. Hiding Memory Access Times

Memory Hierarchy Design (Appendix B and Chapter 2)

Spring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Caches Concepts Review

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

1/19/2009. Data Locality. Exploiting Locality: Caches

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Time. Who Cares About the Memory Hierarchy? Performance. Where Have We Been?

Figure 1: Organisation for 128KB Direct Mapped Cache with 16-word Block Size and Word Addressable

Homework 6. BTW, This is your last homework. Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23. CSCI 402: Computer Architectures

Locality and Data Accesses video is wrong one notes when video is correct

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

ECEC 355: Cache Design

EE 4683/5683: COMPUTER ARCHITECTURE

COSC3330 Computer Architecture Lecture 19. Cache

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

The check bits are in bit numbers 8, 4, 2, and 1.

Transcription:

Taking advantage of spatial locality Use block size larger than one word Example: two words Block index tag () () Alternate representations Word index tag block, word block, word block, word block, word block, word block, word block 3, word block 3, word Word Index = (memory word address) mod ( size in words) Block Index = (memory block address) mod ( size in blocks) (word address) / (words per block) One tag per block ==> improves efficiency write misses have to be handled carefully (should preserve block integrity) - copy block from memory to - write the word within the block in. Word memory address Example: block size = words When a issues a memory address, it is decomposed into a tag, a block index and a block offset Word address Byte offset tag Block index Block offset Word index Word index address Word index block tag Block index tag memory

Taking advantage of spatial locality (block size = 4) Address (showing bit positions) 3 6 5 4 3 Hit Tag 6 Byte offset Index 6 bits 8 bits V Tag Block offset 4K entries 6 3 3 3 3 Mux 3 What is the total number of bits needed for a 64KB assuming 3-bit address? What about an 8KB? Hardware Issues Make reading multiple words faster by using banks of memory EX: cycle to send a word or an address and 5 cycles to access a word in memory-- what is the miss penalty? (block = 4 words). Bank Bank Bank Bank 3 a. one-word wide memory b. 4-word wide organization c. Interleaved memory 4 * ( + 5 + ) = 68 cycles + 5 + = 7 cycles ( + 5 + ) + 3 = cycles 3

Interleaved memory T = send request to bank T = send request to bank T = 3 send request to bank T = 4 send request to bank 3 T = 6 ready at Bank and received at T = 7 T = 7 ready at Bank and received at T = 8 T = 8 ready at Bank and received at T = 9 T = 9 ready at Bank and received at T = Bank Bank Bank Bank 3 c. Interleaved memory ( + 5 + ) + 3 = cycles 4 control with s In the architecture of Chapters 4, replace the memory modules by s -- for example, in pipelined architecture, it takes one cycles to read/write a word into/from the Instruction or the. IF REG EX MEM WB The hit/miss signal from the can go to the control. On a read miss, the pipeline is stalled and the controller sends the address and a read command to the main memory, may have to make room for the new block puts in, and return the missed word to the continues execution. On a write hit, the controller writes the word into the, and If a WRITE THROUGH, writes the word also in main memory. 5

control with s IF REG EX MEM WB On a write miss, the controller If a WRITE BACK Brings the block to (by sending the address and a read command to memory) needed if a block is larger than one word. Writes the word into the If a WRITE THROUGH, writes the word in main memory. If No Write Allocate, does not allocate the block in the If Write Allocate, Brings the block to Writes the word into the 6 Handling writes The WRITE BACK scheme: Does not write into memory every write hit, Writes the block back into memory when done with it (when is that??). Writes the block back only if it is dirty (have been written on) For some time, and memory are incoherent (any problems??) Write buffers are used to avoid stalling the pipeline while a word or a block is written into memory. What if another write or read occurs before memory is done with the first write? -- a typical speed mismatch problem. Do we ever have to stall the execution on a write miss? What kind of locality have we taken advantage of? 7

Decreasing miss ratio with associativity Example (-way set associative, Block size = ) v tag A B C D Address = v tag Equivalent to having two direct mapped s, with the possibility of putting the in either of the two s A B C D 8 Decreasing miss ratio with associativity Block Tag One-way set associative (direct mapped) 3 4 5 two-way set associative Set 3 Tag Tag 6 7 4-way set associative Set Tag Tag Tag Tag 8-way set associative (fully associative) Tag Tag Tag Tag Tag Tag Tag Tag - Need a block replacement policy: LRU (complex) or an access bit (simple) - Give a series of references for which direct mapped results in a lower miss ratio than a -way set associative (with LRU). What about a higher miss ratio? 9

An implementation (4KB, 4-way associative, block size = ) 3 3 9 8 3 8 Index V Tag V Tag V Tag V Tag 53 54 55 3 4-to- multiplexor Hit Associativity implies more hardware cost. Compare the overhead (for tags) with that in direct mapping, May use block size larger than one word, 3 An implementation (4KB, -way associative, block size = ) Address 3 3 9 8 3 8 Index V Tag V Tag 53 54 55 4-to- multiplexor Hit Equivalent to two s, each with block size =. Each has 55 blocks. That is word index = 9 bits, 8 for block index and one for offset within a block, 3