Memory Hierarchy. Mehran Rezaei

Size: px
Start display at page:

Download "Memory Hierarchy. Mehran Rezaei"

Transcription

1 Memory Hierarchy Mehran Rezaei

2 What types of memory do we have? Registers Cache (Static RAM) Main Memory (Dynamic RAM) Disk (Magnetic Disk)

3 Option : Build It Out of Fast SRAM About 5- ns access Decoders are big Array are big It will cost LOTS of money SRAM costs $7 per megabyte $286,72 for MIPS

4 Option 2: Build It Out of DRAM About ns access Why build a fast processor that stalls for dozens of cycles on each memory load? Still costs lots of money for new machines DRAM costs $2 per megabyte $8,92 for MIPS

5 Option 3: Build It Using Disks About,, ns access (snore!) We could have stopped with the Intel 44 Costs are pretty reasonable Disk storage costs $.5 per megabyte $2 for MIPS

6 Our Requirements We want a memory system that runs a processor clock speed (about ns access) We want a memory system that we can afford (maybe 25% to 33% of the total system costs). Options 2-3 are too slow Options -2 (or -3) are too expensive Time for option 4!

7 Option 4: Use a Little of Everything (Wisely) Use a small array of SRAM Small means fast! Small means cheap! Use a larger amount of DRAM And hope that you rarely have to use it Use a really big amount of Disk storage Disks are getting cheaper at a faster rate than we fill them up with data (for most people).

8 Option 4: The Memory Hierarchy Use a small array of SRAM For the CACHE (hopefully for most accesses) Use a bigger amount of DRAM For the Main memory Use a really big amount of Disk storage For the Virtual memory

9 Famous Picture of Food Memory Hierarchy Cache Main Memory Cost Latency Access Disk Storage

10 Datapath Memory Memory Hierarchy Principle of Locality Suggests Make the common case fast Processor Control Unit Speed (ns).25s s s 6 s Size (bytes) s Ks Ms Gs

11 The Big Picture CPU DTLB D-$ I-$ ITLB L2-Cache BIU Physical To mem Addr. map Read Data Buf DRAM Mem Req. Scheduling

12 5 components of computer Computer Processor Memory Devices Keyboard, Mouse Control ( brain ) Datapath ( brawn ) (where programs, data live when running) Input Output Display, Printer Memory Hierarchy

13 Memory Hierarchy input control datapath cache Memory Disk output

14 Memory Hierarchy (Analogy) You re writing a term paper about Programming languages (c++ in particular) at a table in a library Library is equivalent to disk Limitless capacity (in principle) But very slow to retrieve a book Table is Memory Smaller capacity: which means you must return books when table fills up Easier and faster to find book on the table once you have already retrieved it

15 Memory Hierarchy Analogy (Cont d) Open books on the table are Cache Smaller capacity: you can have very few open books fit on the table; again, when table is full you must close a book If you think of it, much, much faster to retrieve data Illusion created: what if we can have the whole library open on the table (and open) Keep as many recently used books open on the table as possible since likely to be used again Also keep as many books on the table as possible, since faster than going to bookshelves

16 Class Problem Given the following: Cache: cycle access time Main memory: cycle access time Disk: cycles access time What is the average access time if you measure that 9% of the cache accesses are hits and 8% of the accesses to main memory are hits.

17 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 2 cache lines 4 bit tag field byte block V V tag data

18 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 Is it in the cache? No valid tags tag This is a Cache miss data Allocate: address tag Mem[] block

19 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 lru tag data Misses: Hits:

20 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 Check tags: 5 lru Cache Miss 5 5 tag data Misses: Hits:

21 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 lru 5 tag 5 data Misses: 2 Hits:

22 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 Check tags: 5, but = (HIT!) lru 5 tag 5 data Misses: 2 Hits:

23 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 lru 5 tag 5 data Misses: 2 Hits:

24 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 lru 5 tag 5 data Misses: 2 Hits:

25 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 7 lru 7 5 and 7 (MISS!) 57 tag 5 7 data Misses: 2 Hits:

26 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 7 lru 7 tag 7 data Misses: 3 Hits:

27 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R lru 7 and 7 = 7 (HIT!) 7 tag 7 data Misses: 3 Hits:

28 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 7 7 lru 7 tag 7 data Misses: 3 Hits:

29 Basic Cache Design Cache memory can copy data from any part of main memory It has 2 parts: The TAG (CAM) holds the memory address The BLOCK (SRAM) holds the memory data Accessing the cache: Compare the reference address with the tag If they match, get the data from the cache block If the don t match, get the data from main memory

30 CAMs: Content Addressable Memories Instead of thinking of memory as an array of data indexed by a memory address Think of memory as a set of data matching a query. Instead of an address, we send data to the memory, asking if any location contains that data.

31 Operations on CAMs Search: the primary way to access a CAM Send data to CAM memory Return found or not found Alternatively, return the address of where it was found Write: Send data for CAM to remember Where should it be stored if CAM is full? Replacement policy» Replace oldest data in the CAM» Replace least recently searched data

32 CAM Array Found 5 storage element CAM array of 9 bits each

33 Previous Use of CAMs You have seen a simple CAM used before, where?

34 Cache Organization A cache memory consists of multiple tag/block pairs (called cache lines) Searches can be done in parallel (within reason) At most one tag will match If there is a tag match, it is a cache HIT If there is no tag match, it is a cache MISS Our goal is to keep the data we think will be accessed in the near future in the cache

35 Cache Operation Every cache miss will get the data from memory and ALLOCATE a cache line to put the data in. Just like any CAM write Which line should be allocated? Random? OK, but hard to grade test questions Better than random? How?

36 Picking the Most Likely Addresses What is the probability of accessing a random memory location? With no information, it is just as likely as any other address But programs are not random They tend to use the same memory locations over and over. We can use this to pick the most referenced locations to put into the cache

37 Temporal Locality The principle of temporal locality in program references says that if you access a memory location (e.g., ) you will be more likely to re-access that location than you will be to reference some other random location.

38 Using Locality in the Cache Temporal locality says any miss data should be placed into the cache It is the most recent reference location Temporal locality says that the least recently referenced (or least recently used LRU ) cache line should be evicted to make room for the new line. Because the re-access probability falls over time as a cache line isn t referenced, the LRU line is least likely to be re-referenced.

39 Calculating Average Access Latency Avg latency = cache latency hit rate + memory latency miss rate Avg latency = cycle (2/5) + 5 (3/5) = 9.4 cycles per reference To improve average latency: Improve memory access latency, or Improve cache access latency, or Improve cache hit rate

40 Class Problem 2 For application X running on the MIPS, you measure that 4% of the operations access memory. You also measure that 9% of the memory references hit in the cache. Whenever a cache miss occurs, the processor is stalled for 2 cycles to transfer the block from memory into the cache. What is the CPI for application X on this processor? What would the CPI be if there were no cache?

41 Calculating Cost How much does a cache cost? Calculate storage requirements Example cache: 2 bytes of SRAM Calculate overhead to support access (tags) Example cache: 2 4-bit tags The cost of the tags is often forgotten for caches, but this cost drives the design of real caches What is the cost if a 32 bit address is used?

42 How do we reduce the overhead? Have a small address Impractical Cache bigger units than bytes Each block has a single tag, and blocks can be whatever size we choose.

43 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 2 cache lines 3 bit tag field 2 byte block V V tag data

44 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 tag data

45 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 lru tag data Addr: Misses: Hits:

46 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 lru tag data Misses: Hits:

47 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 lru tag data Addr: Misses: 2 Hits:

48 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 lru tag data Addr: Misses: 2 Hits:

49 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 lru tag data Misses: 2 Hits:

50 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 lru tag data Addr: Misses: 2 Hits:

51 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 4 lru tag data Misses: 2 Hits:

52 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 4 lru tag data Addr: Misses: 2 Hits:

53 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 4 4 lru tag data Misses: 2 Hits:

54 Spatial Locality Notice that when we accessed address, we also brought in address. This turned out to be a good thing since we later referenced address and found it in the cache. Spatial locality in a program says that if we reference a memory location (e.g., ), we are more likely to reference a location near it (e.g., ) than some random location.

55 Basic Cache Organization Decide on the block size How? Simulate lots of different block sizes and see which one gives the best performance Most systems use a block size between 32 bytes and 28 bytes Longer sizes reduce the overhead by: Reducing the number of CAMs Reducing the size of each CAM Tag Block offset

56 Processor Class Problem 2 Cache Memory What is the cache hit rate when the following code is executed? Ld R M[ 7 ] Ld R2 M[ 6 ] Ld R3 M[ 9 ] Ld R3 M[ ] Ld R2 M[ ] R R R2 R3 2 cache lines 3 bit tag field 2 byte block V V tag data

57 The Main Issue, fully associative and byte line size address V = TAG (CAM) Data = = = hit

58 What seems to be the problem?. How large is the CAM if we need 8 Kbyte cache? 2. How does each of those comparators look like, in terms of actual hardware? 3. How many comparators are needed for 8 Kbyte cache? 4. What should we do?. Increase cache line size 2. Decrease the level of associativity 3. Increase cache line size and decrease set associtivity

59 Increase cache line size *32 Decoder = Byte 3 Byte Byte = hit How many comparators do we need now? Is the size of each comparators reduced?

60 Other alternatives Ultimately only one comparator How many cache lines do we have? Using a portion of address, can we address the cache lines?

61 8*256 Decoder Direct Mapped Cache *32 Decoder Byte 3 Byte Byte = hit

62 7*28 Decoder 2-way set-associative cache *32 Decoder Byte 3 Byte Byte = hit Byte 3 Byte Byte =

63 The three fields All fields are read as unsigned integers. Offset: specifies which byte within the block we want Index: specifies the cache index (which row of the cache we should look in) Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same cache line (cache block)

64 Class Problem Determine the length of offset, index, and tag for the following cache configurations: Direct Mapped, 8 Kbytes, 32 byte line size on 32 bit machine Two-Way set associative, 8 Kbytes, 64 byte line size on 32 bit machine Direct Mapped, 32 Kbytes, 28 byte line size, on 32 bit machine

65 Example Suppose we have a 6KB cache with 4 word blocks Determine the size of the tag, index and offset fields if we re using a 32-bit architecture Offset need to specify correct byte within a block block contains 4 words (each word is 32 bit) = 6 bytes = 2 4 bytes need 4 bits to specify correct byte Case Study Cache simulator (Cont d)

66 Example (Cont d) Index: (index into an array of blocks ) need to specify correct row in cache cache contains 6 KB = 2 4 bytes block contains 2 4 bytes (4 words) # blocks/cache = bytes/cache bytes/block = 2 4 bytes/cache 2 4 bytes/block = 2 blocks/cache need bits to specify this many rows

67 Example (Cont d) Tag: use remaining bits as tag tag length = addr length offset - index = bits = 8 bits so tag is leftmost 8 bits of memory address 32 bits address tag index offset 8 bits bits 4 bits

68 Accessing Data in Cache Ex.: 6KB cache with 4 -word blocks Read 4 addresses x4, xc, x34, x84 Address (hex) 4 8 C Memory Value of Word a b c d C C e f g h i j k l

69 Accessing Data in Cache 4 Addresses: x4, xc, x34, x84 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields Tag Index Offset

70 Direct Mapped, 6 Kbytes, 6 Byte Block Valid Tag x-3 x4-7 x8-b xc-f

71 Valid Index Read x4 Tag field Index field Offset Tag x-3 x4-7 x8-b xc-f...

72 So we read block () Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f

73 No valid data Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f

74 So load that data into cache, setting tag, valid Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

75 Read from cache at offset, return word b Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

76 Read xc =..... Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

77 Index is Valid Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

78 Index valid, Tag Matches Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

79 Index Valid, Tag Matches, return d Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

80 Read x34 =..... Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

81 So read block 3 Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

82 No valid data Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

83 Load that cache block, return word f Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d 2 3 e f g h

84 Read x84 =..... Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d 2 3 e f g h

85 So read Cache Block, Data is Valid Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d 2 3 e f g h

86 Cache Block Tag does not match (...!= 2) Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d 2 3 e f g h

87 Miss, so replace block with new data & tag Valid Tag field Index field Offset Index Tag x-3 x4-7 x8-b xc-f 2 i j k l 2 3 e f g h

88 And return word j Valid Tag field Index field Offset Index Tag x-3 x4-7 x8-b xc-f 2 i j k l 2 3 e f g h

89 And return word j Valid Tag field Index field Offset Index Tag x-3 x4-7 x8-b xc-f 2 i j k l 2 3 e f g h

90 A Better look (Direct Mapped) Memory Address Memory 6 bits 2 3 tag index offset 2 bits 2 bits 2 bits

91 A Better look (2 way set associative) Memory Address Memory bits tag index offset set bits bit 2 bits 6

92 7*28 Decoder 2-way set-associative cache (bigger cache) *32 Decoder Set Byte 3 Byte Byte = hit Byte 3 Byte Byte =

93 Direct Mapped vs. Set Associative cache cache size Direct Mapped - cache size NL*LS NL: Number of lines LS: Line Size Set Associative - cache size NW*NS*LS NW: Number of ways NS: Number of sets LS: Line size

94 Writing vs. Reading Writing: sw $r4,($r6) Is this a write, a read, or a read & a write Hit: data is found in the cache Write through: write to the cache and to the memory For speed we need write buffers Write back: Write to the cache and not to the memory There should exist a dirty bit for each line of the cache Miss: data is not found in the cache Write allocate: first bring the data to the cache then write to the cache Write no-allocate: do not bring line to the cache; simply write to the memory and not to the cache. Does write back write no-allocate make sense?

95 Writing vs. Reading (cont d) Reading: lw $4,($6) Hit: data is found in the cache Simply read it from the cache and put it into the register Miss: Bring the data from the memory to the cache and then load it into the register What if write is write back, and the line is dirty (its dirty bit has been set due to a write) As it turns out, write back is more complicated to implement, and therefore write through is more suitable for the first level.

96 Unified vs. Split Unified cache Both instructions and data share the cache About 75% of the accesses are instruction references Unified cache has slightly lower miss rate Split Instruction cache and data cache No structural hazard

97 Problems Comparing unified and split caches Consider 32K unified cache and 6K instruction 6 K data split cache If miss rate for unified cache is.99% If 75% of the memory references are instruction references if instruction cache miss rate is.64% and data cache miss rate is 6.47% for the split cache Which one of the caches has higher miss rate

98 Problems (Cont d) For the previous problem, assume That a hit takes cycle Miss takes 5 cycles Load or store hit takes extra cycle on a unified cache (why why) Write through cache with write buffer, and ignore the penalty due to write to the buffer What is the average memory access time for each case?

99 Where are we? cache memory disk Talked about Today: Virtual Memory

100 Motivation Cache: Speed is the main motivation Take advantage of the principle of locality Spatial locality Temporal locality Virtual Memory: unlimited memory space Multitasking Allocation Protection A task needs more memory than exists

101 Tasks (processes) to co-exist What happens if gcc needs to expand? If netscape needs more memory than is on the machine? If acroread has an error and writes to address x2? When does gcc have to know it will run at x3? What if netscape isn t using its memory? x x3 x4 x6 OS gcc acroread netscape

102 Linux user virtual address space xc x4 x848 K virtual memory User stack Shared lib heap data text unused Mem invisible to user brk Stack pointer

103 Who takes care of address translation Virtual address Physical address CPU MMU Physical Mem

104 netscape The advantage gcc Physical Memory Hard disk

105 Virtual Memory (paging mechanism) Memory is divided into fixed size pages Linux: 4 Kbyte page size Tru64: 8 Kbyte page size Virtual addr Page offset Address translation Physical addr Page offset

106 Remarks: Paging mechanism (Cont d) If we have virtual memory, the CPU generates virtual address Each block is called a page If a page is not found in the main memory, it is said to be a page fault Two main issues Where to place a page if page fault occurs Page replacement How to declare that a page is in the main memory Addressing mechanism

107 Page table components Page table register Virtual page number Page table register points to beginning of page table Page offset valid Physical page number Physical page number Physical page number Page offset

108 Page table components Page table register x4 Page table register points to beginning of page table xff valid Physical page number x45fc x45fc xff

109 Page table components Page table register x2 Page table register points to beginning of page table xcf valid Physical page number Exception: page fault. Stop this process 2. Pick page to replace 3. Write back data 4. Get referenced page 5. Update page table 6. Reschedule process

110 Size of page table 32-bit address, 8 Kbyte page, and 28 Mbytes of Memory VPN 9 bits PPN offset 3 bits offset 4 bits Each table s entry is about 4 bytes, so 2 9 * 4 = 2 MBytes

111 Reduce the page table size Keep two registers Start address End address Allow hashing Multiple page tables Page the page table

112 Multiple page tables Page table register Virtual address Virtual Superpage Virtual page Page offset valid super page table valid 2 nd level page table page Physical page number page 2 nd level page table Physical page number Page offset Physical address

113 Putting it all together Loading your program in memory Ask operating system to create a new process Construct a page table for this process Mark all page table entries as invalid with a pointer to the disk image of the program That is, point to the executable file containing the binary. Run the program and get an immediate page fault on the first instruction.

114 Page replacement Page table indirection enables a fully associative mapping between virtual and physical pages How do we implement LRU? True LRU is expensive, but LRU is a heuristic anyway, so approximating LRU is fine Reference bit on page, cleared occasionally by operating system. Then pick any unreferenced page to evict.

115 Performance issues Any memory access needs a page table access first; page table is also in memory, isn t it? Load Store How does cache fit in this picture? Physically addressed cache Virtually addressed cache

116 Translation Lookaside Buffer (TLB) vpn offset = = = = = = = = ppn offset

117 Physically/Virtually addressed Cache CPU TLB VA PA TLB CPU VA Cache Cache PA Cache miss Cache miss MEM MEM

118 Physically/Virtually addressed Cache CPU TLB VA PA TLB CPU VA Cache Cache PA Cache miss Cache miss MEM MEM

119 Synonyms (Aliasing problem) Cache Physical Memory Virtual Address Space

120 The Big Picture CPU DTLB D-$ I-$ ITLB L2-Cache BIU Physical To mem Addr. map Read Data Buf DRAM Mem Req. Scheduling

Chapter 2: Memory Hierarchy Design, part 1 - Introducation. Advanced Computer Architecture Mehran Rezaei

Chapter 2: Memory Hierarchy Design, part 1 - Introducation. Advanced Computer Architecture Mehran Rezaei Chapter 2: Memory Hierarchy Design, part 1 - Introducation Advanced Computer Architecture Mehran Rezaei Temporal Locality The principle of temporal locality in program references says that if you access

More information

Review : Pipelining. Memory Hierarchy

Review : Pipelining. Memory Hierarchy CS61C L11 Caches (1) CS61CL : Machine Structures Review : Pipelining The Big Picture Lecture #11 Caches 2009-07-29 Jeremy Huddleston!! Pipeline challenge is hazards "! Forwarding helps w/many data hazards

More information

CS61C : Machine Structures

CS61C : Machine Structures CS C L.. Cache I () Design Principles for Hardware CSC : Machine Structures Lecture.. Cache I -- Kurt Meinz inst.eecs.berkeley.edu/~csc. Simplicity favors regularity Every instruction has operands, opcode

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

! CS61C : Machine Structures. Lecture 22 Caches I. !!Instructor Paul Pearce! ITʼS NOW LEGAL TO JAILBREAK YOUR PHONE!

! CS61C : Machine Structures. Lecture 22 Caches I. !!Instructor Paul Pearce! ITʼS NOW LEGAL TO JAILBREAK YOUR PHONE! inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 22 Caches I 2010-07-28!!!Instructor Paul Pearce! ITʼS NOW LEGAL TO JAILBREAK YOUR PHONE! On Monday the Library of Congress added 5 exceptions

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Is this the beginning of the end for our beloved Lecture 32 Caches I 2004-11-12 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia The Incredibles!

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 30 Caches I 2006-11-08 Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Shuttle can t fly over Jan 1? A computer bug has

More information

Cache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

Lctures 33: Cache Memory - I. Some of the slides are adopted from David Patterson (UCB)

Lctures 33: Cache Memory - I. Some of the slides are adopted from David Patterson (UCB) Lctures 33: Cache Memory - I Some of the slides are adopted from David Patterson (UCB) Outline Memory Hierarchy On-Chip SRAM Direct-Mapped Cache Review: ARM System Architecture Fast on-chip RAM External

More information

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 12 Caches I Lecturer SOE Dan Garcia Midterm exam in 3 weeks! A Mountain View startup promises to do Dropbox one better. 10GB free storage,

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 12 Caches I 2014-09-26 Instructor: Miki Lustig September 23: Another type of Cache PayPal Integrates Bitcoin Processors BitPay, Coinbase

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Computer Systems. Virtual Memory. Han, Hwansoo

Computer Systems. Virtual Memory. Han, Hwansoo Computer Systems Virtual Memory Han, Hwansoo A System Using Physical Addressing CPU Physical address (PA) 4 Main memory : : 2: 3: 4: 5: 6: 7: 8:... M-: Data word Used in simple systems like embedded microcontrollers

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Virtual Memory. CS 3410 Computer System Organization & Programming

Virtual Memory. CS 3410 Computer System Organization & Programming Virtual Memory CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. Where are we now and

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 31 Caches I 2007-04-06 Powerpoint bad!! Research done at the Univ of NSW says that working memory, the brain part providing

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

CS61C - Machine Structures. Lecture 17 - Caches, Part I. October 25, 2000 David Patterson

CS61C - Machine Structures. Lecture 17 - Caches, Part I. October 25, 2000 David Patterson CS1C - Machine Structures Lecture 1 - Caches, Part I October 25, 2 David Patterson http://www-inst.eecs.berkeley.edu/~cs1c/ Things to Remember Magnetic Disks continue rapid advance: %/yr capacity, 4%/yr

More information

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019 MEMORY: SWAPPING Shivaram Venkataraman CS 537, Spring 2019 ADMINISTRIVIA - Project 2b is out. Due Feb 27 th, 11:59 - Project 1b grades are out Lessons from p2a? 1. Start early! 2. Sketch out a design?

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 Caches I 2008-04-11 Lecturer SOE Dan Garcia Hi to Kononov Alexey from Russia! Touted as the fastest CPU on Earth, IBM s new Power6

More information

A Typical Memory Hierarchy

A Typical Memory Hierarchy A Typical Memory Hierarchy Processor Faster Larger Capacity Managed by OS with hardware assistance Datapath Control Registers On-Chip Level One Cache L 1 Managed by Hardware Second Level Cache (SRAM) L

More information

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements CS61C L25 Virtual I (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #25 Virtual I 27-8-7 Scott Beamer, Instructor Another View of the Hierarchy Thus far{ Next: Virtual { Regs Instr.

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Carnegie Mellon Virtual Memory: Concepts 5-23: Introduction to Computer Systems 7 th Lecture, October 24, 27 Instructor: Randy Bryant 2 Hmmm, How Does This Work?! Process Process 2 Process n Solution:

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor

More information

Processes and Virtual Memory Concepts

Processes and Virtual Memory Concepts Processes and Virtual Memory Concepts Brad Karp UCL Computer Science CS 37 8 th February 28 (lecture notes derived from material from Phil Gibbons, Dave O Hallaron, and Randy Bryant) Today Processes Virtual

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory CPS 104 Computer Organization and Programming Lecture 20: Virtual Nov. 10, 1999 Dietolf (Dee) Ramm http://www.cs.duke.edu/~dr/cps104.html CPS 104 Lecture 20.1 Outline of Today s Lecture O Virtual. 6 Paged

More information

A Typical Memory Hierarchy

A Typical Memory Hierarchy A Typical Memory Hierarchy Processor Faster Larger Capacity Managed by Hardware Managed by OS with hardware assistance Datapath Control Registers Level One Cache L 1 Second Level Cache (SRAM) L 2 Main

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

COMP 3221: Microprocessors and Embedded Systems

COMP 3221: Microprocessors and Embedded Systems COMP 3: Microprocessors and Embedded Systems Lectures 7: Cache Memory - III http://www.cse.unsw.edu.au/~cs3 Lecturer: Hui Wu Session, 5 Outline Fully Associative Cache N-Way Associative Cache Block Replacement

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part 11 Memory Hierarchy - I Israel Koren ECE568/Koren Part.11.1 ECE568/Koren Part.11.2 Ideal Memory

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required

More information

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14 98:23 Intro to Computer Organization Lecture 4 Virtual Memory 98:23 Introduction to Computer Organization Lecture 4 Instructor: Nicole Hynes nicole.hynes@rutgers.edu Credits: Several slides courtesy of

More information

Virtual Memory. Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Virtual Memory. Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Virtual Memory Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Virtual Memory Usemain memory asa cache a for secondarymemory

More information

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory 1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner CPS104 Computer Organization and Programming Lecture 16: Virtual Memory Robert Wagner cps 104 VM.1 RW Fall 2000 Outline of Today s Lecture Virtual Memory. Paged virtual memory. Virtual to Physical translation:

More information

Handout 4 Memory Hierarchy

Handout 4 Memory Hierarchy Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Virtual Memory II CSE 351 Spring

Virtual Memory II CSE 351 Spring Virtual Memory II CSE 351 Spring 2018 https://xkcd.com/1495/ Virtual Memory (VM) Overview and motivation VM as a tool for caching Address translation VM as a tool for memory management VM as a tool for

More information

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Page 1. Review: Address Segmentation  Review: Address Segmentation  Review: Address Segmentation Review Address Segmentation " CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 23, 2011! Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! 1111 0000" 1110 000" Seg #"

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura CSE 451: Operating Systems Winter 2013 Page Table Management, TLBs and Other Pragmatics Gary Kimura Moving now from Hardware to how the OS manages memory Two main areas to discuss Page table management,

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 Cache I 2007-8-2 Scott Beamer, Instructor CS61C L23 Caches I (1) The Big Picture Computer Processor (active) Control ( brain ) Datapath

More information

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit. Memory Hierarchy Goal: Fast, unlimited storage at a reasonable cost per bit. Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory. Fast: When you need something

More information

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Memory Hierarchy Requirements. Three Advantages of Virtual Memory CS61C L12 Virtual (1) CS61CL : Machine Structures Lecture #12 Virtual 2009-08-03 Jeremy Huddleston Review!! Cache design choices: "! Size of cache: speed v. capacity "! size (i.e., cache aspect ratio)

More information

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy Datorarkitektur och operativsystem Lecture 7 Memory It is impossible to have memory that is both Unlimited (large in capacity) And fast 5.1 Intr roduction We create an illusion for the programmer Before

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

COSC3330 Computer Architecture Lecture 20. Virtual Memory

COSC3330 Computer Architecture Lecture 20. Virtual Memory COSC3330 Computer Architecture Lecture 20. Virtual Memory Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Virtual Memory Topics Reducing Cache Miss Penalty (#2) Use

More information

Topic 18 (updated): Virtual Memory

Topic 18 (updated): Virtual Memory Topic 18 (updated): Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level

More information

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary Chapter 8 :: Systems Chapter 8 :: Topics Digital Design and Computer Architecture David Money Harris and Sarah L. Harris Introduction System Performance Analysis Caches Virtual -Mapped I/O Summary Copyright

More information

14 May 2012 Virtual Memory. Definition: A process is an instance of a running program

14 May 2012 Virtual Memory. Definition: A process is an instance of a running program Virtual Memory (VM) Overview and motivation VM as tool for caching VM as tool for memory management VM as tool for memory protection Address translation 4 May 22 Virtual Memory Processes Definition: A

More information

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe Virtual Memory: Concepts 5 23 / 8 23: Introduction to Computer Systems 6 th Lecture, Mar. 2, 22 Instructors: Todd C. Mowry & Anthony Rowe Today Address spaces VM as a tool lfor caching VM as a tool for

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

CSE 153 Design of Operating Systems

CSE 153 Design of Operating Systems CSE 53 Design of Operating Systems Winter 28 Lecture 6: Paging/Virtual Memory () Some slides modified from originals by Dave O hallaron Today Address spaces VM as a tool for caching VM as a tool for memory

More information

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012!  (0xE0)    (0x70)  (0x50) CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 22, 2011! Anthony D. Joseph and Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! Segmentation! Paging! Recap Segmentation

More information

CPE 631 Lecture 04: CPU Caches

CPE 631 Lecture 04: CPU Caches Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR

More information

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory Recall: Address Space Map 13: Memory Management Biggest Virtual Address Stack (Space for local variables etc. For each nested procedure call) Sometimes Reserved for OS Stack Pointer Last Modified: 6/21/2004

More information

Virtual Memory. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Virtual Memory. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Virtual Memory CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Click any letter let me know you re here today. Instead of a DJ Clicker Question today,

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

COSC 6385 Computer Architecture. - Memory Hierarchies (I) COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap

More information

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management CSE 120 July 18, 2006 Day 5 Memory Instructor: Neil Rhodes Translation Lookaside Buffer (TLB) Implemented in Hardware Cache to map virtual page numbers to page frame Associative memory: HW looks up in

More information

Memory Hierarchy: Caches, Virtual Memory

Memory Hierarchy: Caches, Virtual Memory Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories

More information

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hierarchy review. ECE 154B Dmitri Strukov Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal

More information

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011 Virtual Memory CS6, Lecture 5 Prof. Stephen Chong October 2, 2 Announcements Midterm review session: Monday Oct 24 5:3pm to 7pm, 6 Oxford St. room 33 Large and small group interaction 2 Wall of Flame Rob

More information

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be

More information

Topic 18: Virtual Memory

Topic 18: Virtual Memory Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

Virtual Memory Oct. 29, 2002

Virtual Memory Oct. 29, 2002 5-23 The course that gives CMU its Zip! Virtual Memory Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs class9.ppt Motivations for Virtual Memory Use Physical

More information

6.004 Tutorial Problems L20 Virtual Memory

6.004 Tutorial Problems L20 Virtual Memory 6.004 Tutorial Problems L20 Virtual Memory Page Table (v + p) bits in virtual address (m + p) bits in physical address 2 v number of virtual pages 2 m number of physical pages 2 p bytes per physical page

More information

Virtual Memory Nov 9, 2009"

Virtual Memory Nov 9, 2009 Virtual Memory Nov 9, 2009" Administrivia" 2! 3! Motivations for Virtual Memory" Motivation #1: DRAM a Cache for Disk" SRAM" DRAM" Disk" 4! Levels in Memory Hierarchy" cache! virtual memory! CPU" regs"

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1> Digital Logic & Computer Design CS 4341 Professor Dan Moldovan Spring 21 Copyright 27 Elsevier 8- Chapter 8 :: Memory Systems Digital Design and Computer Architecture David Money Harris and Sarah L.

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Virtual Memory. Motivation:

Virtual Memory. Motivation: Virtual Memory Motivation:! Each process would like to see its own, full, address space! Clearly impossible to provide full physical memory for all processes! Processes may define a large address space

More information

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5) Memory Hierarchy Why are you dressed like that? Halloween was weeks ago! It makes me look faster, don t you think? Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity (Study

More information