Memory Hierarchy. Mehran Rezaei

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Memory Hierarchy. Mehran Rezaei"

Transcription

1 Memory Hierarchy Mehran Rezaei

2 What types of memory do we have? Registers Cache (Static RAM) Main Memory (Dynamic RAM) Disk (Magnetic Disk)

3 Option : Build It Out of Fast SRAM About 5- ns access Decoders are big Array are big It will cost LOTS of money SRAM costs $7 per megabyte $286,72 for MIPS

4 Option 2: Build It Out of DRAM About ns access Why build a fast processor that stalls for dozens of cycles on each memory load? Still costs lots of money for new machines DRAM costs $2 per megabyte $8,92 for MIPS

5 Option 3: Build It Using Disks About,, ns access (snore!) We could have stopped with the Intel 44 Costs are pretty reasonable Disk storage costs $.5 per megabyte $2 for MIPS

6 Our Requirements We want a memory system that runs a processor clock speed (about ns access) We want a memory system that we can afford (maybe 25% to 33% of the total system costs). Options 2-3 are too slow Options -2 (or -3) are too expensive Time for option 4!

7 Option 4: Use a Little of Everything (Wisely) Use a small array of SRAM Small means fast! Small means cheap! Use a larger amount of DRAM And hope that you rarely have to use it Use a really big amount of Disk storage Disks are getting cheaper at a faster rate than we fill them up with data (for most people).

8 Option 4: The Memory Hierarchy Use a small array of SRAM For the CACHE (hopefully for most accesses) Use a bigger amount of DRAM For the Main memory Use a really big amount of Disk storage For the Virtual memory

9 Famous Picture of Food Memory Hierarchy Cache Main Memory Cost Latency Access Disk Storage

10 Datapath Memory Memory Hierarchy Principle of Locality Suggests Make the common case fast Processor Control Unit Speed (ns).25s s s 6 s Size (bytes) s Ks Ms Gs

11 The Big Picture CPU DTLB D-$ I-$ ITLB L2-Cache BIU Physical To mem Addr. map Read Data Buf DRAM Mem Req. Scheduling

12 5 components of computer Computer Processor Memory Devices Keyboard, Mouse Control ( brain ) Datapath ( brawn ) (where programs, data live when running) Input Output Display, Printer Memory Hierarchy

13 Memory Hierarchy input control datapath cache Memory Disk output

14 Memory Hierarchy (Analogy) You re writing a term paper about Programming languages (c++ in particular) at a table in a library Library is equivalent to disk Limitless capacity (in principle) But very slow to retrieve a book Table is Memory Smaller capacity: which means you must return books when table fills up Easier and faster to find book on the table once you have already retrieved it

15 Memory Hierarchy Analogy (Cont d) Open books on the table are Cache Smaller capacity: you can have very few open books fit on the table; again, when table is full you must close a book If you think of it, much, much faster to retrieve data Illusion created: what if we can have the whole library open on the table (and open) Keep as many recently used books open on the table as possible since likely to be used again Also keep as many books on the table as possible, since faster than going to bookshelves

16 Class Problem Given the following: Cache: cycle access time Main memory: cycle access time Disk: cycles access time What is the average access time if you measure that 9% of the cache accesses are hits and 8% of the accesses to main memory are hits.

17 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 2 cache lines 4 bit tag field byte block V V tag data

18 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 Is it in the cache? No valid tags tag This is a Cache miss data Allocate: address tag Mem[] block

19 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 lru tag data Misses: Hits:

20 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 Check tags: 5 lru Cache Miss 5 5 tag data Misses: Hits:

21 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 lru 5 tag 5 data Misses: 2 Hits:

22 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 Check tags: 5, but = (HIT!) lru 5 tag 5 data Misses: 2 Hits:

23 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 lru 5 tag 5 data Misses: 2 Hits:

24 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 lru 5 tag 5 data Misses: 2 Hits:

25 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 7 lru 7 5 and 7 (MISS!) 57 tag 5 7 data Misses: 2 Hits:

26 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 5 7 lru 7 tag 7 data Misses: 3 Hits:

27 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R lru 7 and 7 = 7 (HIT!) 7 tag 7 data Misses: 3 Hits:

28 A Very Simple Memory System Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 7 ] Ld R2 M[ 7 ] R R R2 R3 7 7 lru 7 tag 7 data Misses: 3 Hits:

29 Basic Cache Design Cache memory can copy data from any part of main memory It has 2 parts: The TAG (CAM) holds the memory address The BLOCK (SRAM) holds the memory data Accessing the cache: Compare the reference address with the tag If they match, get the data from the cache block If the don t match, get the data from main memory

30 CAMs: Content Addressable Memories Instead of thinking of memory as an array of data indexed by a memory address Think of memory as a set of data matching a query. Instead of an address, we send data to the memory, asking if any location contains that data.

31 Operations on CAMs Search: the primary way to access a CAM Send data to CAM memory Return found or not found Alternatively, return the address of where it was found Write: Send data for CAM to remember Where should it be stored if CAM is full? Replacement policy» Replace oldest data in the CAM» Replace least recently searched data

32 CAM Array Found 5 storage element CAM array of 9 bits each

33 Previous Use of CAMs You have seen a simple CAM used before, where?

34 Cache Organization A cache memory consists of multiple tag/block pairs (called cache lines) Searches can be done in parallel (within reason) At most one tag will match If there is a tag match, it is a cache HIT If there is no tag match, it is a cache MISS Our goal is to keep the data we think will be accessed in the near future in the cache

35 Cache Operation Every cache miss will get the data from memory and ALLOCATE a cache line to put the data in. Just like any CAM write Which line should be allocated? Random? OK, but hard to grade test questions Better than random? How?

36 Picking the Most Likely Addresses What is the probability of accessing a random memory location? With no information, it is just as likely as any other address But programs are not random They tend to use the same memory locations over and over. We can use this to pick the most referenced locations to put into the cache

37 Temporal Locality The principle of temporal locality in program references says that if you access a memory location (e.g., ) you will be more likely to re-access that location than you will be to reference some other random location.

38 Using Locality in the Cache Temporal locality says any miss data should be placed into the cache It is the most recent reference location Temporal locality says that the least recently referenced (or least recently used LRU ) cache line should be evicted to make room for the new line. Because the re-access probability falls over time as a cache line isn t referenced, the LRU line is least likely to be re-referenced.

39 Calculating Average Access Latency Avg latency = cache latency hit rate + memory latency miss rate Avg latency = cycle (2/5) + 5 (3/5) = 9.4 cycles per reference To improve average latency: Improve memory access latency, or Improve cache access latency, or Improve cache hit rate

40 Class Problem 2 For application X running on the MIPS, you measure that 4% of the operations access memory. You also measure that 9% of the memory references hit in the cache. Whenever a cache miss occurs, the processor is stalled for 2 cycles to transfer the block from memory into the cache. What is the CPI for application X on this processor? What would the CPI be if there were no cache?

41 Calculating Cost How much does a cache cost? Calculate storage requirements Example cache: 2 bytes of SRAM Calculate overhead to support access (tags) Example cache: 2 4-bit tags The cost of the tags is often forgotten for caches, but this cost drives the design of real caches What is the cost if a 32 bit address is used?

42 How do we reduce the overhead? Have a small address Impractical Cache bigger units than bytes Each block has a single tag, and blocks can be whatever size we choose.

43 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 2 cache lines 3 bit tag field 2 byte block V V tag data

44 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 tag data

45 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 lru tag data Addr: Misses: Hits:

46 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 lru tag data Misses: Hits:

47 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 lru tag data Addr: Misses: 2 Hits:

48 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 lru tag data Addr: Misses: 2 Hits:

49 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 lru tag data Misses: 2 Hits:

50 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 lru tag data Addr: Misses: 2 Hits:

51 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 4 lru tag data Misses: 2 Hits:

52 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 5 4 lru tag data Addr: Misses: 2 Hits:

53 Block size for caches Processor Cache Memory Ld R M[ ] Ld R2 M[ 5 ] Ld R3 M[ ] Ld R3 M[ 4 ] Ld R2 M[ ] R R R2 R3 4 4 lru tag data Misses: 2 Hits:

54 Spatial Locality Notice that when we accessed address, we also brought in address. This turned out to be a good thing since we later referenced address and found it in the cache. Spatial locality in a program says that if we reference a memory location (e.g., ), we are more likely to reference a location near it (e.g., ) than some random location.

55 Basic Cache Organization Decide on the block size How? Simulate lots of different block sizes and see which one gives the best performance Most systems use a block size between 32 bytes and 28 bytes Longer sizes reduce the overhead by: Reducing the number of CAMs Reducing the size of each CAM Tag Block offset

56 Processor Class Problem 2 Cache Memory What is the cache hit rate when the following code is executed? Ld R M[ 7 ] Ld R2 M[ 6 ] Ld R3 M[ 9 ] Ld R3 M[ ] Ld R2 M[ ] R R R2 R3 2 cache lines 3 bit tag field 2 byte block V V tag data

57 The Main Issue, fully associative and byte line size address V = TAG (CAM) Data = = = hit

58 What seems to be the problem?. How large is the CAM if we need 8 Kbyte cache? 2. How does each of those comparators look like, in terms of actual hardware? 3. How many comparators are needed for 8 Kbyte cache? 4. What should we do?. Increase cache line size 2. Decrease the level of associativity 3. Increase cache line size and decrease set associtivity

59 Increase cache line size *32 Decoder = Byte 3 Byte Byte = hit How many comparators do we need now? Is the size of each comparators reduced?

60 Other alternatives Ultimately only one comparator How many cache lines do we have? Using a portion of address, can we address the cache lines?

61 8*256 Decoder Direct Mapped Cache *32 Decoder Byte 3 Byte Byte = hit

62 7*28 Decoder 2-way set-associative cache *32 Decoder Byte 3 Byte Byte = hit Byte 3 Byte Byte =

63 The three fields All fields are read as unsigned integers. Offset: specifies which byte within the block we want Index: specifies the cache index (which row of the cache we should look in) Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same cache line (cache block)

64 Class Problem Determine the length of offset, index, and tag for the following cache configurations: Direct Mapped, 8 Kbytes, 32 byte line size on 32 bit machine Two-Way set associative, 8 Kbytes, 64 byte line size on 32 bit machine Direct Mapped, 32 Kbytes, 28 byte line size, on 32 bit machine

65 Example Suppose we have a 6KB cache with 4 word blocks Determine the size of the tag, index and offset fields if we re using a 32-bit architecture Offset need to specify correct byte within a block block contains 4 words (each word is 32 bit) = 6 bytes = 2 4 bytes need 4 bits to specify correct byte Case Study Cache simulator (Cont d)

66 Example (Cont d) Index: (index into an array of blocks ) need to specify correct row in cache cache contains 6 KB = 2 4 bytes block contains 2 4 bytes (4 words) # blocks/cache = bytes/cache bytes/block = 2 4 bytes/cache 2 4 bytes/block = 2 blocks/cache need bits to specify this many rows

67 Example (Cont d) Tag: use remaining bits as tag tag length = addr length offset - index = bits = 8 bits so tag is leftmost 8 bits of memory address 32 bits address tag index offset 8 bits bits 4 bits

68 Accessing Data in Cache Ex.: 6KB cache with 4 -word blocks Read 4 addresses x4, xc, x34, x84 Address (hex) 4 8 C Memory Value of Word a b c d C C e f g h i j k l

69 Accessing Data in Cache 4 Addresses: x4, xc, x34, x84 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields Tag Index Offset

70 Direct Mapped, 6 Kbytes, 6 Byte Block Valid Tag x-3 x4-7 x8-b xc-f

71 Valid Index Read x4 Tag field Index field Offset Tag x-3 x4-7 x8-b xc-f...

72 So we read block () Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f

73 No valid data Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f

74 So load that data into cache, setting tag, valid Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

75 Read from cache at offset, return word b Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

76 Read xc =..... Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

77 Index is Valid Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

78 Index valid, Tag Matches Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

79 Index Valid, Tag Matches, return d Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

80 Read x34 =..... Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

81 So read block 3 Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

82 No valid data Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d

83 Load that cache block, return word f Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d 2 3 e f g h

84 Read x84 =..... Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d 2 3 e f g h

85 So read Cache Block, Data is Valid Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d 2 3 e f g h

86 Cache Block Tag does not match (...!= 2) Tag field Index field Offset Valid Index Tag x-3 x4-7 x8-b xc-f a b c d 2 3 e f g h

87 Miss, so replace block with new data & tag Valid Tag field Index field Offset Index Tag x-3 x4-7 x8-b xc-f 2 i j k l 2 3 e f g h

88 And return word j Valid Tag field Index field Offset Index Tag x-3 x4-7 x8-b xc-f 2 i j k l 2 3 e f g h

89 And return word j Valid Tag field Index field Offset Index Tag x-3 x4-7 x8-b xc-f 2 i j k l 2 3 e f g h

90 A Better look (Direct Mapped) Memory Address Memory 6 bits 2 3 tag index offset 2 bits 2 bits 2 bits

91 A Better look (2 way set associative) Memory Address Memory bits tag index offset set bits bit 2 bits 6

92 7*28 Decoder 2-way set-associative cache (bigger cache) *32 Decoder Set Byte 3 Byte Byte = hit Byte 3 Byte Byte =

93 Direct Mapped vs. Set Associative cache cache size Direct Mapped - cache size NL*LS NL: Number of lines LS: Line Size Set Associative - cache size NW*NS*LS NW: Number of ways NS: Number of sets LS: Line size

94 Writing vs. Reading Writing: sw $r4,($r6) Is this a write, a read, or a read & a write Hit: data is found in the cache Write through: write to the cache and to the memory For speed we need write buffers Write back: Write to the cache and not to the memory There should exist a dirty bit for each line of the cache Miss: data is not found in the cache Write allocate: first bring the data to the cache then write to the cache Write no-allocate: do not bring line to the cache; simply write to the memory and not to the cache. Does write back write no-allocate make sense?

95 Writing vs. Reading (cont d) Reading: lw $4,($6) Hit: data is found in the cache Simply read it from the cache and put it into the register Miss: Bring the data from the memory to the cache and then load it into the register What if write is write back, and the line is dirty (its dirty bit has been set due to a write) As it turns out, write back is more complicated to implement, and therefore write through is more suitable for the first level.

96 Unified vs. Split Unified cache Both instructions and data share the cache About 75% of the accesses are instruction references Unified cache has slightly lower miss rate Split Instruction cache and data cache No structural hazard

97 Problems Comparing unified and split caches Consider 32K unified cache and 6K instruction 6 K data split cache If miss rate for unified cache is.99% If 75% of the memory references are instruction references if instruction cache miss rate is.64% and data cache miss rate is 6.47% for the split cache Which one of the caches has higher miss rate

98 Problems (Cont d) For the previous problem, assume That a hit takes cycle Miss takes 5 cycles Load or store hit takes extra cycle on a unified cache (why why) Write through cache with write buffer, and ignore the penalty due to write to the buffer What is the average memory access time for each case?

99 Where are we? cache memory disk Talked about Today: Virtual Memory

100 Motivation Cache: Speed is the main motivation Take advantage of the principle of locality Spatial locality Temporal locality Virtual Memory: unlimited memory space Multitasking Allocation Protection A task needs more memory than exists

101 Tasks (processes) to co-exist What happens if gcc needs to expand? If netscape needs more memory than is on the machine? If acroread has an error and writes to address x2? When does gcc have to know it will run at x3? What if netscape isn t using its memory? x x3 x4 x6 OS gcc acroread netscape

102 Linux user virtual address space xc x4 x848 K virtual memory User stack Shared lib heap data text unused Mem invisible to user brk Stack pointer

103 Who takes care of address translation Virtual address Physical address CPU MMU Physical Mem

104 netscape The advantage gcc Physical Memory Hard disk

105 Virtual Memory (paging mechanism) Memory is divided into fixed size pages Linux: 4 Kbyte page size Tru64: 8 Kbyte page size Virtual addr Page offset Address translation Physical addr Page offset

106 Remarks: Paging mechanism (Cont d) If we have virtual memory, the CPU generates virtual address Each block is called a page If a page is not found in the main memory, it is said to be a page fault Two main issues Where to place a page if page fault occurs Page replacement How to declare that a page is in the main memory Addressing mechanism

107 Page table components Page table register Virtual page number Page table register points to beginning of page table Page offset valid Physical page number Physical page number Physical page number Page offset

108 Page table components Page table register x4 Page table register points to beginning of page table xff valid Physical page number x45fc x45fc xff

109 Page table components Page table register x2 Page table register points to beginning of page table xcf valid Physical page number Exception: page fault. Stop this process 2. Pick page to replace 3. Write back data 4. Get referenced page 5. Update page table 6. Reschedule process

110 Size of page table 32-bit address, 8 Kbyte page, and 28 Mbytes of Memory VPN 9 bits PPN offset 3 bits offset 4 bits Each table s entry is about 4 bytes, so 2 9 * 4 = 2 MBytes

111 Reduce the page table size Keep two registers Start address End address Allow hashing Multiple page tables Page the page table

112 Multiple page tables Page table register Virtual address Virtual Superpage Virtual page Page offset valid super page table valid 2 nd level page table page Physical page number page 2 nd level page table Physical page number Page offset Physical address

113 Putting it all together Loading your program in memory Ask operating system to create a new process Construct a page table for this process Mark all page table entries as invalid with a pointer to the disk image of the program That is, point to the executable file containing the binary. Run the program and get an immediate page fault on the first instruction.

114 Page replacement Page table indirection enables a fully associative mapping between virtual and physical pages How do we implement LRU? True LRU is expensive, but LRU is a heuristic anyway, so approximating LRU is fine Reference bit on page, cleared occasionally by operating system. Then pick any unreferenced page to evict.

115 Performance issues Any memory access needs a page table access first; page table is also in memory, isn t it? Load Store How does cache fit in this picture? Physically addressed cache Virtually addressed cache

116 Translation Lookaside Buffer (TLB) vpn offset = = = = = = = = ppn offset

117 Physically/Virtually addressed Cache CPU TLB VA PA TLB CPU VA Cache Cache PA Cache miss Cache miss MEM MEM

118 Physically/Virtually addressed Cache CPU TLB VA PA TLB CPU VA Cache Cache PA Cache miss Cache miss MEM MEM

119 Synonyms (Aliasing problem) Cache Physical Memory Virtual Address Space

120 The Big Picture CPU DTLB D-$ I-$ ITLB L2-Cache BIU Physical To mem Addr. map Read Data Buf DRAM Mem Req. Scheduling

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CS61C : Machine Structures

CS61C : Machine Structures CS C L.. Cache I () Design Principles for Hardware CSC : Machine Structures Lecture.. Cache I -- Kurt Meinz inst.eecs.berkeley.edu/~csc. Simplicity favors regularity Every instruction has operands, opcode

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Cache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 12 Caches I Lecturer SOE Dan Garcia Midterm exam in 3 weeks! A Mountain View startup promises to do Dropbox one better. 10GB free storage,

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Virtual Memory. CS 3410 Computer System Organization & Programming

Virtual Memory. CS 3410 Computer System Organization & Programming Virtual Memory CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. Where are we now and

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 12 Caches I 2014-09-26 Instructor: Miki Lustig September 23: Another type of Cache PayPal Integrates Bitcoin Processors BitPay, Coinbase

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Computer Systems. Virtual Memory. Han, Hwansoo

Computer Systems. Virtual Memory. Han, Hwansoo Computer Systems Virtual Memory Han, Hwansoo A System Using Physical Addressing CPU Physical address (PA) 4 Main memory : : 2: 3: 4: 5: 6: 7: 8:... M-: Data word Used in simple systems like embedded microcontrollers

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

A Typical Memory Hierarchy

A Typical Memory Hierarchy A Typical Memory Hierarchy Processor Faster Larger Capacity Managed by Hardware Managed by OS with hardware assistance Datapath Control Registers Level One Cache L 1 Second Level Cache (SRAM) L 2 Main

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 Caches I 2008-04-11 Lecturer SOE Dan Garcia Hi to Kononov Alexey from Russia! Touted as the fastest CPU on Earth, IBM s new Power6

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part 11 Memory Hierarchy - I Israel Koren ECE568/Koren Part.11.1 ECE568/Koren Part.11.2 Ideal Memory

More information

COMP 3221: Microprocessors and Embedded Systems

COMP 3221: Microprocessors and Embedded Systems COMP 3: Microprocessors and Embedded Systems Lectures 7: Cache Memory - III http://www.cse.unsw.edu.au/~cs3 Lecturer: Hui Wu Session, 5 Outline Fully Associative Cache N-Way Associative Cache Block Replacement

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory CPS 104 Computer Organization and Programming Lecture 20: Virtual Nov. 10, 1999 Dietolf (Dee) Ramm http://www.cs.duke.edu/~dr/cps104.html CPS 104 Lecture 20.1 Outline of Today s Lecture O Virtual. 6 Paged

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary Chapter 8 :: Systems Chapter 8 :: Topics Digital Design and Computer Architecture David Money Harris and Sarah L. Harris Introduction System Performance Analysis Caches Virtual -Mapped I/O Summary Copyright

More information

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner CPS104 Computer Organization and Programming Lecture 16: Virtual Memory Robert Wagner cps 104 VM.1 RW Fall 2000 Outline of Today s Lecture Virtual Memory. Paged virtual memory. Virtual to Physical translation:

More information

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Page 1. Review: Address Segmentation  Review: Address Segmentation  Review: Address Segmentation Review Address Segmentation " CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 23, 2011! Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! 1111 0000" 1110 000" Seg #"

More information

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy Datorarkitektur och operativsystem Lecture 7 Memory It is impossible to have memory that is both Unlimited (large in capacity) And fast 5.1 Intr roduction We create an illusion for the programmer Before

More information

Topic 18 (updated): Virtual Memory

Topic 18 (updated): Virtual Memory Topic 18 (updated): Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level

More information

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory Recall: Address Space Map 13: Memory Management Biggest Virtual Address Stack (Space for local variables etc. For each nested procedure call) Sometimes Reserved for OS Stack Pointer Last Modified: 6/21/2004

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1> Digital Logic & Computer Design CS 4341 Professor Dan Moldovan Spring 21 Copyright 27 Elsevier 8- Chapter 8 :: Memory Systems Digital Design and Computer Architecture David Money Harris and Sarah L.

More information

Memory Hierarchy: Caches, Virtual Memory

Memory Hierarchy: Caches, Virtual Memory Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories

More information

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe Virtual Memory: Concepts 5 23 / 8 23: Introduction to Computer Systems 6 th Lecture, Mar. 2, 22 Instructors: Todd C. Mowry & Anthony Rowe Today Address spaces VM as a tool lfor caching VM as a tool for

More information

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012!  (0xE0)    (0x70)  (0x50) CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 22, 2011! Anthony D. Joseph and Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! Segmentation! Paging! Recap Segmentation

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

EE 457 Unit 7a. Cache and Memory Hierarchy

EE 457 Unit 7a. Cache and Memory Hierarchy EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several levels of faster and faster memory to hide delay of upper levels Registers Unit of Transfer:, Half, or Byte (LW, LH, LB

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

Topic 18: Virtual Memory

Topic 18: Virtual Memory Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection

More information

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory: From Address Translation to Demand Paging Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Review. Manage memory to disk? Treat as cache. Lecture #26 Virtual Memory II & I/O Intro

Review. Manage memory to disk? Treat as cache. Lecture #26 Virtual Memory II & I/O Intro CS61C L26 Virtual Memory II (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #26 Virtual Memory II & I/O Intro 2007-8-8 Scott Beamer, Instructor Apple Releases new imac Review Manage

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

COSC 6385 Computer Architecture. - Memory Hierarchies (I) COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap

More information

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hierarchy review. ECE 154B Dmitri Strukov Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017 CIS 5512 - Operating Systems Memory Management Cache Professor Qiang Zeng Fall 2017 Previous class What is logical address? Who use it? Describes a location in the logical memory address space Compiler

More information

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter Cached Memory Computer Systems Chapter 6.2-6.5 Cost Speed The Memory Hierarchy Capacity The Cache Concept CPU Registers Addresses Data Memory ALU Instructions The Cache Concept Memory CPU Registers Addresses

More information

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs Virtual Memory Today Motivations for VM Address translation Accelerating translation with TLBs Fabián Chris E. Bustamante, Riesbeck, Fall Spring 2007 2007 A system with physical memory only Addresses generated

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III Lecturer SOE Dan Garcia Google Glass may be one vision of the future of post-pc interfaces augmented reality with video

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

Virtual Memory. Motivation:

Virtual Memory. Motivation: Virtual Memory Motivation:! Each process would like to see its own, full, address space! Clearly impossible to provide full physical memory for all processes! Processes may define a large address space

More information

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

CS 61C: Great Ideas in Computer Architecture. Virtual Memory CS 61C: Great Ideas in Computer Architecture Virtual Memory Instructor: Justin Hsia 7/30/2012 Summer 2012 Lecture #24 1 Review of Last Lecture (1/2) Multiple instruction issue increases max speedup, but

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12a Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution Datorarkitektur Fö 2-1 Datorarkitektur Fö 2-2 Components of the Memory System The Memory System 1. Components of the Memory System Main : fast, random access, expensive, located close (but not inside)

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Virtual Memory 11282011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Cache Virtual Memory Projects 3 Memory

More information

ECE4680 Computer Organization and Architecture. Virtual Memory

ECE4680 Computer Organization and Architecture. Virtual Memory ECE468 Computer Organization and Architecture Virtual Memory If I can see it and I can touch it, it s real. If I can t see it but I can touch it, it s invisible. If I can see it but I can t touch it, it

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Movie Rental Store You have a huge warehouse with every movie ever made.

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12 Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses

1 0 d c b a ...! ! Benefits of Larger Block Size.  Very applicable with Stored-Program Concept  Works well for sequential array accesses Asst. Proflecturer SOE Miki Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III WHEN FIBER OPTICS IS TOO SLOW 07/16/2014: Wall Street Buys NATO Microwave Towers in

More information

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements CS 61C: Great Ideas in Computer Architecture Virtual II Guest Lecturer: Justin Hsia Agenda Review of Last Lecture Goals of Virtual Page Tables Translation Lookaside Buffer (TLB) Administrivia VM Performance

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 32 Caches III 2008-04-16 Lecturer SOE Dan Garcia Hi to Chin Han from U Penn! Prem Kumar of Northwestern has created a quantum inverter

More information

16 Sharing Main Memory Segmentation and Paging

16 Sharing Main Memory Segmentation and Paging Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

What is Cache Memory? EE 352 Unit 11. Motivation for Cache Memory. Memory Hierarchy. Cache Definitions Cache Address Mapping Cache Performance

What is Cache Memory? EE 352 Unit 11. Motivation for Cache Memory. Memory Hierarchy. Cache Definitions Cache Address Mapping Cache Performance What is EE 352 Unit 11 Definitions Address Mapping Performance memory is a small, fast memory used to hold of data that the processor will likely need to access in the near future sits between the processor

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

www-inst.eecs.berkeley.edu/~cs61c/

www-inst.eecs.berkeley.edu/~cs61c/ CS61C Machine Structures Lecture 34 - Caches II 11/16/2007 John Wawrzynek (www.cs.berkeley.edu/~johnw) www-inst.eecs.berkeley.edu/~cs61c/ 1 What to do on a write hit? Two Options: Write-through update

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

ECE468 Computer Organization and Architecture. Virtual Memory

ECE468 Computer Organization and Architecture. Virtual Memory ECE468 Computer Organization and Architecture Virtual Memory ECE468 vm.1 Review: The Principle of Locality Probability of reference 0 Address Space 2 The Principle of Locality: Program access a relatively

More information

CSE 560 Computer Systems Architecture

CSE 560 Computer Systems Architecture This Unit: CSE 560 Computer Systems Architecture App App App System software Mem I/O The operating system () A super-application Hardware support for an Page tables and address translation s and hierarchy

More information

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Address Translation Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics How to reduce the size of page tables? How to reduce the time for

More information

Chapter 4: Memory Management. Part 1: Mechanisms for Managing Memory

Chapter 4: Memory Management. Part 1: Mechanisms for Managing Memory Chapter 4: Memory Management Part 1: Mechanisms for Managing Memory Memory management Basic memory management Swapping Virtual memory Page replacement algorithms Modeling page replacement algorithms Design

More information

Memory Hierarchy Y. K. Malaiya

Memory Hierarchy Y. K. Malaiya Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath

More information

Solutions for Chapter 7 Exercises

Solutions for Chapter 7 Exercises olutions for Chapter 7 Exercises 1 olutions for Chapter 7 Exercises 7.1 There are several reasons why you may not want to build large memories out of RAM. RAMs require more transistors to build than DRAMs

More information

Virtual Memory: Concepts

Virtual Memory: Concepts Virtual Memory: Concepts 5-23: Introduction to Computer Systems 7 th Lecture, March 2, 27 Instructors: Franz Franchetti & Seth Copen Goldstein Hmmm, How Does This Work?! Process Process 2 Process n Solution:

More information

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs

More information

Lecture 20: Virtual Memory, Protection and Paging. Multi-Level Caches

Lecture 20: Virtual Memory, Protection and Paging. Multi-Level Caches S 09 L20-1 18-447 Lecture 20: Virtual Memory, Protection and Paging James C. Hoe Dept of ECE, CMU April 8, 2009 Announcements: Best class ever, next Monday Handouts: H14 HW#4 (on Blackboard), due 4/22/09

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 205 Lecture 23 LAST TIME: VIRTUAL MEMORY! Began to focus on how to virtualize memory! Instead of directly addressing physical memory, introduce a level of

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Virtual Memory. Study Chapters something I understand. Finally! A lecture on PAGE FAULTS! doing NAND gates. I wish we were still

Virtual Memory. Study Chapters something I understand. Finally! A lecture on PAGE FAULTS! doing NAND gates. I wish we were still Virtual Memory I wish we were still doing NAND gates Study Chapters 7.4-7.8 Finally! A lecture on something I understand PAGE FAULTS! L23 Virtual Memory 1 You can never be too rich, too good looking, or

More information

Pipelining Exercises, Continued

Pipelining Exercises, Continued Pipelining Exercises, Continued. Spot all data dependencies (including ones that do not lead to stalls). Draw arrows from the stages where data is made available, directed to where it is needed. Circle

More information

Virtual Memory. CS 351: Systems Programming Michael Saelee

Virtual Memory. CS 351: Systems Programming Michael Saelee Virtual Memory CS 351: Systems Programming Michael Saelee registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) previously: SRAM

More information