Computer Systems Architecture I. CSE 560M Lecture 15 Prof. Patrick Crowley

Size: px

Start display at page:

Download "Computer Systems Architecture I. CSE 560M Lecture 15 Prof. Patrick Crowley"

Joleen Powell
6 years ago
Views:

1 Computer Systems Architecture I CSE 560M Lecture 15 Prof. Patrick Crowley

2 Plan for Today Announcements PM2 due today Design presentations due Nov 4 Questions Today s discussion: Memory Hierarchy Basics

3 Design Presentations, Nov 4 Your presentation should clearly describe your processor and the design choices you made in PM1 & 2 Presentation logistics Limit your presentation to 10 minutes, and expect 5 mins of questions. Be sure that all group members contribute to the presentation. Send a copy of your PPT presentation to Shakir before class You are free to use your own laptop for the presentation. If you prefer to use Prof. Crowley s laptop, you must submit your slides via by 3pm or bring a copy on a USB memory stick. Also, before class, upload 10-minute screencast version (see next)

4 Project Design Screencast For at least two reasons, you will create 10 min screencasts of your design presentations 1. Use SnagIt (or another tool). My demo: 2. Upload the video to youtube.com 3. Post the link on the course discussion page

5 Project Logistics Groups of three Project Timeline Task Due Description Three major deadlines Design due Oct 19 Project Demo Nov 23 Project Milestone 1 Project Milestone 2 Oct 19 Nov 2 Processor Design Detailed Design & VHDL Report due Dec 7 Design Presentations Nov 4 Presentation of detailed design This week: PM2 due, design presentations Project Demo Final Report Nov 23 Dec 7 Demo working VHDL model Final report due

6 Project Groups Group 1 David Lu Michael Schultz Austin Abrams 2 Yong Fu Greg Galloway Christopher Thomas 3 Timothy York Haowei Yuan Stephen Schuh 4 Yu-Ying Liang Abu Sayeed Saifullah Raphael Njuguna 5 Cory Flanagin Jessica Schupp

7 Importance of Memory Performance

8 Memory Hierarchy Registers: visible to ISA and renamed by hardware (Hierarchy of) Caches: plus their enhancements Write buffers, victim caches, prefetch/stream buffers, etc. TLBs and their management Virtual memory system (O.S. level) and hardware assists Main memory Disks Remote memory Based on principle of locality

9 Illustrated Hierarchy

10 Questions that arise at each level What is the unit of information transferred from level to level? word, block, page table entry, page Where is that unit of info placed? directed by ISA, restricted mapping, general mapping How do we find if that unit of info is present? depends on the mapping What happens if there is no room for it? structural hazard, replacement algorithm What happens when we change the contents of the info unit? i.e., what happens on a write?

11 Caches (on-chip, off-chip) Caches consist of a set of entries where each entry has A block (or line) of data: some subset of memory content A tag: allows us to recognize if the desired block is present Status bits: valid, dirty, status for multiprocessors, etc. Capacity (or size) of a cache Number of blocks * block size

12 Cache Organization Most restricted mapping Direct-mapped cache. A given memory location (block) can only be mapped in a single place in the cache. Generally this place is given by: (block address) mod (number of blocks in the cache) Number of blocks usually a power of two Most general mapping Fully-associative cache. A given memory location (block) can be mapped anywhere in the cache. No cache of decent size is implemented this way but this is the (general) mapping for pages (disk to main memory) and for small TLBs.

13 Cache Organization (cont d) Less restricted mapping Set-associative cache. Blocks in the cache are grouped into sets and a given memory location (block) maps into a set. Within the set the block can be placed anywhere. Sets with 2 (2-way set associative), 4, 8 and 16 blocks have been implemented. Set usually chosen with (block address) mod (number of sets in cache) Direct-mapped = 1-way set associative Fully associative with m entries is m-way set associative Capacity Capacity = number of sets * set-associativity * block size

14 Organization Illustrated

15 Cache hit or miss? How to detect if a memory address (a byte address) has a valid image in the cache Address is decomposed into 3 fields Block offset (depends on block size) Index (depends on number of sets) Tag (the remainder of the address) The index determines the set, and the tag can be used to check for a match Also need to check status bits for validity

16 Detecting a Cache Hit 32 bit addresses 2 blocks per set 2K sets

17 Why Set-Associative Caches? Pros Better hit ratio Great improvement from 1 to 2, less from 2 to 4, minimal after that (according to conventional wisdom) Cons The higher the associativity the larger the number of comparisons to be made in parallel for high-performance can have an impact on cycle time for on-chip caches Higher associativity requires a wider tag array

18 Replacement Algorithm None for direct-mapped Random or LRU or pseudo-lru for setassociative caches Not a very important factor for performance

19 Write Policies Write-through On a write hit, write both in cache and in memory Pro: consistent view of memory (better for I/O & coherence) Con: more memory traffic Write-back On a write hit, write only in cache (requires a dirty bit) Only update memory when evicted block is dirty Pro-con: reverse of write-through Write miss: Write allocate Usually used with write-back No-write allocate (non-allocate, write-around) Usually used with write-through

20 Classifying misses: The 3 C s Compulsory (cold start) The first time you touch a block. Reduced (for a given cache capacity and associativity) by having large blocks Capacity The working set is too big for the ideal cache of same capacity and block size (i.e., fully associative with optimal replacement algorithm). Only remedy: larger cache. Conflict (interference) Mapping of two or more hot blocks to the same location. Increasing associativity decreases this type of miss. There is a fourth C: coherence misses (for multiprocessors)

21 Example Cache Hierarchies AMD Athlon Intel P3 Intel P4 IBM PPC 405CR App Desktop Desktop,server Desktop Embedded ICache(L1) 64KB,2-way 16KB, 2-way 12K uop 16KB,2-way trace cache DCache(L1) 64KB,2-way 16KB,2-way 8KB,4-way 8KB,2-way L2(on-chip) 256KB, 16-way KB, 8-way 256KB,8-way None

22 Assignment Readings For Wednesday None For Monday H&P: Read C.3-C.4, 5.2 For Wednesday (Nov 11) H&P: sections

Performance metrics for caches

Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache / total number of memory references Typically h = 0.90 to 0.97 Equivalent metric: