MEMORY Reading: Chapter 6, except cache implementation details (6.4.1-6.4.6) and segmentation (6.5.5) https://en.wikipedia.org/wiki/probability 2 Objectives Understand the concepts and terminology of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured Understand the concept behind cache memory 1
3 Reading Text 6.1-6.4 Wikipedia en.wikipedia.org/wiki/cpu_cache 4 Types of Memory Types of main memory: Random Access Memory (RAM) data can be accessed quickly in any random order Read-Only-Memory (ROM) cannot be easily modified Non-random storage includes CD, DVD, hard disk, and mag tape 2
5 RAM Types of RAM Dynamic RAM (DRAM) contains capacitors that slowly leak their charge over time Inexpensive and must be refreshed every few milliseconds to prevent data loss Static RAM (SRAM) very fast and does not need to be periodically refreshed, but volatile (useful for cache memory) Volatile memory is computer memory that requires power to maintain the stored information (Wikipedia) 6 ROM Does not need to be refreshed Needs very little charge to retain its memory. Used to store permanent, or semipermanent data that persists even while the system is turned off 3
7 Mobile DDR Double Data Rate synchronous DRAM for mobile computers Generations LPDDR low power version LPDDR3 current generation LPDDR4 50% better performance than LPDDR3, consuming 40% less energy DDR SDRAM modules for desktop computers are commonly called DIMMs 8 The Memory Hierarchy In general, fast memory is more expensive than slow memory Light travels a little over a foot in a nanosecond Memory is organized in a hierarchy to provide good price performance Small, fast storage elements are kept in the CPU, larger, slower main memory is accessed through the data bus Larger, (almost) permanent storage (e.g. disk drives) is located further from the CPU Goal is to minimize user involvement in determining the location of data 4
9 The Memory Hierarchy Storage organization can be thought of as a pyramid Trade-off of access time vs. storage size 10 Cache Memory Results in faster accesses by storing recently used data closer to the CPU Smaller than main memory, but faster Processor determines if an address is contained in the cache by using the cache controller Cache controller sometimes accesses by content; hence, it is often called content addressable memory Where else do you see caching principles applied? 5
11 Levels of Cache L1 Level 1 cache fastest, but smallest L2 Level 2 cache larger than L1 cache, but not as fast L3 Level 3 cache even larger and slower Example: Apple ARM A8 L1 64KB/64KB L2 1 MB L3 4MB 12 Cache Overview To access a particular piece of data, the CPU first sends a request to its nearest memory (usually L1 cache) If the data is not in immediate cache, the next layer of memory (L2 cache or main memory) is queried If the data is not in main memory, the request goes to L3 or virtual memory (e.g., disk) Once the data is located, a block (called a cache line) containing the data, is fetched into cache memory Very large potential variation in access time for data 6
13 Locality of Reference Locality encourages data movement of blocks of data Principle of locality - once a byte is accessed, it is likely that a data element will soon be referenced There are three forms of locality: Temporal locality- Recently-accessed data elements tend to be accessed again. Spatial locality - Accesses tend to cluster. Sequential locality - Instructions tend to be accessed sequentially. What are high level language examples of the forms of locality? 14 What is a Cache Block? A block is just a way of organizing memory into groups of bytes that can be moved between memory and cache Example: if memory is 16MB and the block size is 64 bytes How many bytes (in a power of 2)? How many blocks (in a power of 2)? 2 24 2 24 /2 6 =2 18 2 18 blocks * 2 6 bytes/block = 2 24 bytes 7
15 Cache Data Movement Data moves a block (cache line) at a time between memory and cache Memory A memory address is mapped into a cache address Cache Block of data Block of data Strategy - Which cache block corresponds to a memory block? 16 Offset Addressing You can calculate a memory address by adding the address of a memory block to the offset in that block Example: a 16 byte block would have offset values of 0000, 0001, 0010, 0011,, 1111 Memory Block of data 8
17 Calculating a Block Number A memory address can be decomposed into a block number and an offset If a 4KB memory uses a block size of 16 bytes, memory contains 2 12 bytes and 2 8 blocks A memory address can be decomposed into a block number and an offset 1100 1110 0010 Block number offset 18 Direct Mapped Cache A cache line contains a block of data (and more) Each cache line is associated with multiple blocks Memory in memory Cache Cache line Block of data Block of data Processor needs to know if the address is has is associated with an active block in the cache Block of data Block of data 9
19 Cache Access Process CPU maps memory address into a tag, cache line location, and offset The cache line might be one of many memory blocks Processor matches the tag field of the memory address to the tag field of the cache line If the tag fields do not match (requested line is not in cache), line is moved from main memory to the cache Mapping schemes direct, fully associative, and set associative Above applies to direct mapped cache 20 Memory Address Fields Computer converts the main memory address to a cache line address Tag identifies which block in memory is in the cache tag cache line address offset Block number 10
21 Cache Line Layout Cache not only holds the data block, but also a block identifier (tag) Tag matching the tag of the memory address with the tag field of the cache line identifies a hit or miss Flag bits valid bit and dirty bit tag data block flag bits Flag bits tell the processor if the cache line has been modified 22 How the Tag Field Works Cache block address is determined from the block field of memory address Tag field of memory address is compared with tag field of cache line tag Memory address 0 0 11 Block number Cache Example 1 0 1 1 Memory Address 0 0 11 0 1 1 0 1 1 11
23 Example Direct Mapped Cache Byte addressable main memory with 4 bytes per block 4 blocks of main memory 2 blocks of cache memory 0 0 11 Tag identifies the block among Tag field 1 bit those possible for that cache slot Block number 1 bit (2 1 blocks of cache) Offset 2 bits (2 2 bytes per block) Memory address 0011 and 1011 compete for the same cache block 24 Are We on Track? Computer with Direct-mapped cache with 32 blocks Each cache block is 16 bytes 2 20 bytes of byte addressable main memory Questions 1. How many blocks of main memory are there? 2. What is the format of a memory address as seen by the cache (i.e., what are the sizes of tag, block, and offset fields)? 3. To which cache block will the memory address 0 DB 63 map? 12
25 Were We on Track? How many blocks of main memory? Main memory 2 20 bytes 2 4 bytes per block 2 16 blocks of main memory (2 20 / 2 4 ) Size of fields Offset 4 bits (2 4 bytes per block) Block 5 bits (2 5 cache blocks) Tag 11 bits (20-5-4) Memory address memory address 0 DB 63 is 0000 1101 1011 0110 0011 11 5 4 Maps to cache block 10110 26 Definitions Hit - data is found at a given memory level Miss data is not found Hit rate - percentage of time data is found at a given memory level Miss rate - percentage of time data is not found Miss rate = (1 - hit rate) Hit time - time required to access data at a given memory level Miss penalty - time required to process a miss including the time to replace a block of memory plus the time to deliver the data to the processor 13
27 Probability Recap Probability likelihood that an event will occur Quantified as a number between 0 and 1 0 not possible 1 certain Example probability that a coin flip will result in heads is.5 Expected value - the probability-weighted average of all possible values Example the expected value of a roll of a die is 3.5 = 1*.167 + 2*.167 +3*.167 + 4*.167 +5*.167 + 6*.167 28 Effective Access Time (EAT) A measure of hierarchal memory Assumes memory and cache access initiate simultaneously Expected (or average) time per access = cache hit probability * cache access time + cache miss probability * memory access time Example Access times: cache - 10ns, memory 200ns Probabilities: cache hit -.99, cache miss -.01 EAT =.99*10ns +.01 * 200ns = 9.9ns + 2ns = 11.9ns We will extend this when we consider multi-level cache and virtual memory 14
31 Finding the Line in the Cache Lookup in the cache to find a match (if any) with the address Various approaches Direct-mapped each location in memory corresponds to only one entry in the cache Content-addressable memory every address in cache is examined in parallel, returning the cache line or a no-hit response 2-way set associative each location maps to 2 locations in cache Direct mapped cache is simpler to implement, but associative cache performs better 32 Fully Associative Cache A main memory block can be placed anywhere in the cache Cache lookup is much more complex since all cache lines are matched to the memory address tag offset Tag field is equivalent to tag+block in direct-mapping 15
33 Cache Write Policies Cache replacement policies must take into account dirty blocks (blocks that have been updated while they were in the cache) Dirty blocks must be written back to memory A write policy determines how to write back to memory Write policies Write through - updates cache and main memory simultaneously on every write Write back (also called copyback) - updates memory only when the block is selected for replacement. 34 Cache Write Policy Pros & Cons Write through Advantage cache coherence Disadvantage - memory must be updated with each cache write (slowdown is usually negligible, because the majority of accesses tend to be reads, not writes) Write back Advantage - memory traffic is minimized Disadvantage - memory does not always agree with the value in cache, causing potential problems in multi-core systems Cache coherence becomes important with multi-core systems 16
35 Cache Coherence Cache coherence - the consistency of data stored in local caches of a shared resource Clients are usually separate cores in a multi-core processor 36 Separate Caches Unified or integrated cache -both instructions and data are cached Many modern systems employ separate caches for data and instructions (called a Harvard cache) The separation of data from instructions provides better locality, at the cost of greater complexity Why do separate caches for instructions and data work well? 17
37 Multi-Level Cache Memory Most of today s small systems employ multilevel cache hierarchies The levels of cache form their own small memory hierarchy Level 1 cache (8KB to 64KB) - on the processor Access time is typically about 4ns Level 2 cache (64KB to 2MB) may be on the die, motherboard, or on an expansion card. Access time is usually around 15-20ns Cache size estimates from text (low range) to higher L1 cache size increases as chip real estate becomes available 38 Example AMD K8 64 byte cache lines Source: Wikipedia 18
39 3-Level Cache Memory Level 2 cache - on the same die as the CPU (reducing access time to about 10ns) Level 3 cache (2MB to 256MB) - cache that is either Situated between the processor and main memory or On the die 40 Have You Met The Objectives? Understand the concepts and terminology of hierarchical memory organization Understand how each level of memory contributes to system performance, and how the performance is measured Understand the concept behind cache memory 19