EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

Similar documents
EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

CS152 Computer Architecture and Engineering Lecture 17: Cache System

COSC 6385 Computer Architecture - Memory Hierarchies (I)

ארכיטקטורת יחידת עיבוד מרכזי ת

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Course Administration

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

ECE ECE4680

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141

EE 4683/5683: COMPUTER ARCHITECTURE

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Modern Computer Architecture

CPE 631 Lecture 04: CPU Caches

CS3350B Computer Architecture

Computer Architecture Spring 2016

EEC 483 Computer Organization

UCB CS61C : Machine Structures

14:332:331. Week 13 Basics of Cache

UC Berkeley CS61C : Machine Structures

CS61C : Machine Structures

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

Recap: The Big Picture: Where are We Now? The Five Classic Components of a Computer. CS152 Computer Architecture and Engineering Lecture 20.

Page 1. Memory Hierarchies (Part 2)

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

UCB CS61C : Machine Structures

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses

Block Size Tradeoff (1/3) Benefits of Larger Block Size. Lecture #22 Caches II Block Size Tradeoff (3/3) Block Size Tradeoff (2/3)

Memory Technologies. Technology Trends

Levels in memory hierarchy

UCB CS61C : Machine Structures

CS152: Computer Architecture and Engineering Caches and Virtual Memory. October 31, 1997 Dave Patterson (http.cs.berkeley.

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Time. Recap: Who Cares About the Memory Hierarchy? Performance. Processor-DRAM Memory Gap (latency)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Cache Memory: Instruction Cache, HW/SW Interaction. Admin

Memory. Lecture 22 CS301

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Cache Memory - II. Some of the slides are adopted from David Patterson (UCB)

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

Memory Hierarchy Review

LECTURE 11. Memory Hierarchy

EECS150 - Digital Design Lecture 17 Memory 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS61C : Machine Structures

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Advanced Computer Architecture

LECTURE 10: Improving Memory Access: Direct and Spatial caches

14:332:331. Week 13 Basics of Cache

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

And in Review! ! Locality of reference is a Big Idea! 3. Load Word from 0x !

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

www-inst.eecs.berkeley.edu/~cs61c/

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

12 Cache-Organization 1

COSC 6385 Computer Architecture - Memory Hierarchies (II)

10/7/13! Anthony D. Joseph and John Canny CS162 UCB Fall 2013! " (0xE0)" " " " (0x70)" " (0x50)"

COMP 3221: Microprocessors and Embedded Systems

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Mo Money, No Problems: Caches #2...

Handout 4 Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Recall: Paging. Recall: Paging. Recall: Paging. CS162 Operating Systems and Systems Programming Lecture 13. Address Translation, Caching

CS162 Operating Systems and Systems Programming Lecture 13. Caches and TLBs. Page 1

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

Show Me the $... Performance And Caches

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

ECE331: Hardware Organization and Design

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EE414 Embedded Systems Ch 5. Memory Part 2/2

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

EN1640: Design of Computing Systems Topic 06: Memory System

Time. Who Cares About the Memory Hierarchy? Performance. Where Have We Been?

CS Computer Architecture

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy

Introduction to cache memories

Transcription:

EECS15 - Digital Design Lecture 11 SRAM (II), Caches September 29, 211 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http//www-inst.eecs.berkeley.edu/~cs15 Fall 211 EECS15 Lecture 11 Page 1 Homework #4 due today Announcements Homework #5 out today Due next Thurs. Project checkpoints #1-3 released Checkpoint #1 due next Wed. evening Attend discussion session, start early! Next Tues. lecture will be given by Daiwei (Elad will be out of town on Tues. and so office hours will be cancelled) Will schedule additional TA office hours if needed Fall 211 EECS15 Lecture 11 Page 2 1

First-in-first-out (FIFO) Memory Used to implement queues. Find common use in computers & communication devices Generally, used to decouple actions of producer and consumer starting state after write after read c b a d c b a d c b Producer can perform many writes without consumer performing any reads (or vice versa). Buffer size is finite, so on average need equal # of reads and writes Typical uses interfacing I/O devices, e.g. network interface. Data bursts from network, then processor bursts to memory buffer (or reads one word at a time from interface). Operations not synchronized. Example Audio output. Processor produces output samples in bursts (during process swap-in time). Audio DAC clocks it out at constant sample rate. Fall 211 EECS15 Lecture 11 Page 3 FIFO Interfaces Address pointers are used internally to keep next write position and next read position into a dual-port memory. write ptr If pointers equal after write FULL read ptr write ptr read ptr If pointers equal after read EMPTY write ptr read ptr Fall 211 EECS15 Lecture 11 Page 4 2

Xilinx Virtex5 FIFOs Virtex5 BlockRAMS include dedicated circuits for FIFOs. Details in User Guide (ug19). Takes advantage of separate dual ports and independent ports clocks. Fall 211 EECS15 Lecture 11 Page 5 Processor Design Considerations (1/2) Register File Consider distributed RAM (LUT RAM) Size is close to what is needed distributed RAM primitive configurations are 32 or 64 bits deep. Extra width is easily achieved by parallel arrangements. LUT-RAM configurations offer multi-porting options - useful for register files. Asynchronous read, might be useful by providing flexibility on where to put register read in the pipeline. Instruction / Data Caches Consider Block RAM Higher density, lower cost for large number of bits A single 36kbit Block RAM implements 1K 32-bit words. Configuration stream based initialization, permits a simple boot strap procedure. Other Memories in Project? Video? Main memory will be in external DRAM Fall 211 EECS15 Lecture 11 Page 6 3

XUP Board External DRAM 256 MByte DDR2 DRAM with 4MHz data rate. More generally, how does software interface to I/O devices? *SO-DIMM stands for small outline dual in-line memory module. SO-DIMMS are often used in systems which have space restrictions such as notebooks. *DDR2 stands for second generation double data rate. DDR transfers data both on the rising and falling edges of the clock signal. Fall 211 EECS15 Lecture 11 Page 7 Why Different Types of Memory? Fall 211 EECS15 Lecture 11 Page 8 4

Example Direct-Mapped Cache Reminder entire address space is larger than size of cache (by definition) Need a way to map memory addresses into cache locations And keep track of what memory address actually was Cache tags M-byte cache with N byte blocks Fall 211 EECS15 Lecture 11 Page 9 Ex. 1KB Direct Mapped Cache with 32B Blocks For an M byte cache The uppermost (32 log 2 (M)) bits are always the The lowest log 2 (N) bits are the Byte Select 31 Block address 9 4 Example x5 Cache Index Byte Select Stored as part of the cache state Ex x1 Ex x Valid Bit Byte 31 Byte 1 Byte x5 Byte 63 Byte 33 Byte 32 1 2 3 Byte 123 Byte 992 31 Fall 211 EECS15 Lecture 11 Page 1 5

In general, larger block sizes take advantage of spatial locality, BUT Larger block size means larger miss penalty Takes longer time to fill up the block If block size is too big relative to cache size, miss rate will go up Too few cache blocks In general, Average Access Time = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate Miss Penalty Block Size Tradeoff Miss Rate Exploits Spatial Locality Fewer blocks compromises temporal locality Average Access Time Increased Miss Penalty & Miss Rate Block Size Block Size Block Size Fall 211 EECS15 Lecture 11 Page 11 Valid Bit Extreme Example Single Line Byte 3 Byte 2 Byte 1 Byte Cache Size = 4 bytes Block Size = 4 bytes Only ONE entry in the cache If an item is accessed, likely that it will be accessed again soon But it is unlikely that it will be accessed again immediately!!! The next access will likely be a miss Continually loading data into the cache but discard them before they are used again Worst nightmare of a cache designer Ping Pong Effect Conflict Misses are misses caused by Different memory locations mapped to the same cache index Solution 1 make the cache size bigger Solution 2 Multiple entries for the same Cache Index Fall 211 EECS15 Lecture 11 Page 12 6

Set Associative Cache N-way set associative N entries for each Cache Index N direct mapped caches operate in parallel Example Two-way set associative cache Cache Index selects a set from the cache The two tags in the set are compared to the input in parallel Data is selected based on the tag result Valid Cache Block Cache Index Cache Block Valid Adr Tag Compare Sel1 1 Mux Sel Compare OR Hit Cache Block Fall 211 EECS15 Lecture 11 Page 13 Extreme Ex. #2 Fully Associative Cache Fully Associative Cache Forget about the Cache Index Compare the s of all cache entries in parallel Example Block Size = 32 B blocks, we need (M/32) 27-bit comparators Usually implemented with a Content Addressable Memory (CAM) By definition Conflict Miss = for a fully associative cache 31 (27 bits long) 4 Byte Select Ex x1 = = = = Valid Bit Byte 31 Byte 63 Byte 1 Byte 33 Byte Byte 32 = Fall 211 EECS15 Lecture 11 Page 14 7

Valid Disadvantage of Set Associative Caches N-way Set Associative Cache versus Direct Mapped Cache N comparators vs. 1 Extra MUX delay for the data Data comes AFTER Hit/Miss decision and set selection In a direct mapped cache, Cache Block is available BEFORE Hit/Miss Possible to assume a hit and continue. Recover later if miss. Cache Block Cache Index Cache Block Valid Adr Tag Compare Sel1 1 Mux Sel Compare OR Cache Block Hit Fall 211 EECS15 Lecture 11 Page 15 Summary Sources of Cache Misses Compulsory (cold start or process migration, first reference) first access to a block Cold fact of life not a whole lot you can do about it Note If you are going to run billions of instruction, Compulsory Misses are insignificant Conflict (collision) Multiple memory locations mapped to the same cache location Solution 1 increase cache size Solution 2 increase associativity Capacity Cache cannot contain all blocks access by the program Solution increase cache size Invalidation other process (e.g., I/O) updates memory Fall 211 EECS15 Lecture 11 Page 16 8

Cache Misses Fall 211 EECS15 Lecture 11 Page 17 Which Block to Replace on a Miss? Easy for Direct Mapped Don t really have any choice Set Associative or Fully Associative Random LRU (Least Recently Used) Associativity 2-way 4-way 8-way Size LRU Rand. LRU Rand. LRU Rand. 16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.% 64 KB 1.9% 2.% 1.5% 1.7% 1.4% 1.5% 256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12% Fall 211 EECS15 Lecture 11 Page 18 9

What Happens on a Write? Write through The information is written to both the block in the cache and to the block in the lower-level memory. Write back The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. is block clean or dirty? Pros and Cons of each? WT read misses cannot result in writes WB no writes of repeated writes WT always combined with write buffers so that don t wait for lower level memory Fall 211 EECS15 Lecture 11 Page 19 Write Buffer for Write Through Processor Cache DRAM Write Buffer A Write Buffer is needed between the Cache and Memory Processor writes data into the cache and the write buffer Memory controller write contents of the buffer to memory Write buffer is just a FIFO Typical number of entries 4 Works fine if Store frequency (w.r.t. time) << 1 / DRAM write cycle Memory system designer s nightmare Store frequency (w.r.t. time) > 1 / DRAM write cycle Write buffer saturation Fall 211 EECS15 Lecture 11 Page 2 1

Write Buffer Saturation Processor Cache DRAM Write Buffer Store frequency (w.r.t. time) > 1 / DRAM write cycle If this condition exist for a long period of time (CPU cycle time too quick and/or too many store instructions in a row) Store buffer will overflow no matter how big you make it CPU Cycle Time <= DRAM Write Cycle Time Solution for write buffer saturation Use a write back cache Install a second level (L2) cache w/write-back Processor Cache L2 Cache DRAM Write Buffer Fall 211 EECS15 Lecture 11 Page 21 Write-Miss Policy Write Allocate vs. Not Allocate Assume a 16-bit write to memory location x causes a miss Do we read in the block? Yes Write Allocate No Write Not Allocate 31 Example x 9 Cache Index Ex x 4 Byte Select Ex x Valid Bit x5 Byte 31 Byte 63 Byte 1 Byte 33 Byte Byte 32 1 2 3 Byte 123 Byte 992 31 Fall 211 EECS15 Lecture 11 Page 22 11