CPSC 330 Computer Organization

Size: px

Start display at page:

Download "CPSC 330 Computer Organization"

Timothy Spencer
6 years ago
Views:

1 CPSC 33 Computer Organization Lecture 7c Memory Adapted from CS52, CS 6C and notes by Kevin Peterson and Morgan Kaufmann Publishers, Copyright 24. Improving cache performance Two ways of improving performance: Decrease the miss ratio: More flexible placement of blocks Associativity discussed in Lec7c Decrease the miss penalty: Multi-level caching used in high-end computers selling for more than $, in99. Today, multi-level caching is common for less than $ in desktop computers. CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 2

2 Reducing cache misses by more flexible placement of blocks Direct-mapped cache structure: there s a direct mapping from any block address in memory to a single location in the upper level of the hierarchy. Fully associative cache structure: A block in memory may be associated with any entry in the cache. Set-associative cache structure: There is a fixed number of locations (at least two) where each block can be placed. Combines direct-mapped and fully associative placement. CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 3 Decreasing miss ratio with associativity Increasing the associativity increases the number of per set Note that the cache size in blocks = # sets x associativity One-way set associative (direct mapped) Set Tag Data Set Set 2 3 Four-way set associative Two-way set associative Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Eight-way set associative (fully associative) Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 4 2

3 Cache Alternatives 4-way set associative cache Address What is the size of this associative cache? Comparators determine? Index V Tag 22 8 Data V Tag Data V Tag Data V Tag Data The output of the comparators is used to? 4-to- multiplexor Hit Data CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 5 4-Way Set Associative Cache - continued 4 block/set becomes the number of simultaneous compares to perform the search in parallel Index 2 V Tag 3 3 Address Data V Tag Data V Tag Data V Tag Data Although larger sets increase the probability of a hit, they do so at the expense of more hardware, and consequently access time to- multiplexor Hit Data CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 6 3

Cache Alternatives multilevel caches It is common to use multi-level caches which increase in size, but decrease in performance, as distance from the processor is increased.

4 Cache Alternatives multilevel caches It is common to use multi-level caches which increase in size, but decrease in performance, as distance from the processor is increased. Most PCs have L (on-chip) and L2 (off-chip) caches AMD die processor die (93 mm 2 ) L2: 42% of die! L cache is fixed for a given processor, but L2 cache maybe expandable. Key is that cache memories are faster than main memory CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 7 Virtual Memory A technique that uses main memory as a cache for secondary storage. Motivation: Allow efficient and safe sharing of memory among multiple programs. Implements the translation of a program s address to physical address, which enforces protection of address space. CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 8 4

5 Virtual Memory Programs live in their own large virtual space on disk Virtual addresses As programs are activated they are loaded into memory in blocks, called pages Virtual address is: An address that corresponds to a location in virtual space and is translated by address mapping to a physical address when memory is accessed Address translation (mapping): Virtual address is mapped to an address used to access memory. Address translation Physical addresses Disk addresses CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 9 Virtual Memory terminology Virtual memory is organized into blocks called pages ranging in size from 4KB to 64KB Virtual address consists of virtual page number high order address bits page offset low order address bits Physical memory is organized into block holders called page frames same size as pages Physical address consists of physical page number high order address bits page offset low order address bits Address translation is done using a page table mapping virtual pages to physical page frames CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 5

6 Virtual Memory address translation In order to locate a virtual page in memory need to translate to a physical page Page tables provide virtual to physical mapping Virtual address Virtual page number Page offset Translation Physical page number Page offset Physical address CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory Virtual Memory address translation In order to locate a virtual page in memory need to translate to a physical page using page tables Page tables provide virtual to physical mapping What is the number of entries in the page table? What is the size in bytes of the virtual address space? 2 32 = 4 GB 2 2 Page table register Virtual address Virtual page number Page offset 2 2 Valid Physical pagenumber What is size in bytes of the physical address space? 2 3 =GB Page table What is the size of the allowable main memory? 2 3 =GB If then page is not present inmemory Physical pagenumber Page offset Physical address CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 2 6

7 Page Tables Virtual page number Page table Physical page or Valid disk address Physical memory Disk storage CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 3 Making Address Translation Fast A cache for address translations: translation lookaside buffer (TLB) TLB: A cache that keeps track of recently used address mappings to avoid an access to the page table. TLB mappings are shown in blue. Virtual page number Valid DirtyRef Physical page Valid DirtyRef or disk address Page table Tag TLB Physical page address Physical memory Disk storage Typical values for a TLB: 6-52 entries, miss-rate:.% - % miss-penalty: cycles CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 4 7

Virtual Memory performance implications A page fault occurs when the page for the referenced address is not yet in memory, this is analogous to a cache miss However, a page fault is more expensive,

8 Virtual Memory performance implications A page fault occurs when the page for the referenced address is not yet in memory, this is analogous to a cache miss However, a page fault is more expensive, this requires reading a page from disk into memory (orders of magnitude slower than cache miss) Every virtual address translation (page table lookup) also costs an extra memory cycle to get the physical page frame number before even fetching data from memory or cache To reduce time spent on virtual translation a special cache is used to hold recently translated addresses (temporal and spatial locality) This cache is the translation lookaside buffer or TLB CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 5 Modern Systems Things are getting complicated! CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 6 8

9 And costly! A Precision Workstation 36 with 3.2 GHz P4 costs ~ $9 Upgrading to the Extreme edition processor (with 2 MB of L3 cache) costs ~ $24 Dell Precision Workstation 45 (allows dual processing) for 3.2 GHz Xeon (with MB of L3 cache) costs ~ $2 Adding a second processor costs ~ $35!! CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 7 Issue yet to be resolved Processor speeds continue to increase very fast much faster than either DRAM or disk access times,, Performance, CPU Memory Design challenge: dealing with this growing disparity 3 rd level caches and more? Memory design? Year CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 8 9

10 Exercise 7-2 Compute the total number of bits required to implement this cache for the Intrisity FastMATH embedded fast microprocessor. CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 9 Exercise 7-7 CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 2

11 Last Homework (due Monday /27) Exercises 7-2, 7-6, 7-7, 7-2, 7-5, 7-8 CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 2 Next Time I/O Devices CNU Fall 26 CPSC33 CompOrg: Dr. Gerousis Lec7c Memory 22

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored