Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Similar documents
Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Translation Buffers (TLB s)

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Virtual Memory Review. Page faults. Paging system summary (so far)

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by:

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy

EN1640: Design of Computing Systems Topic 06: Memory System

Computer Science 146. Computer Architecture

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

HY225 Lecture 12: DRAM and Virtual Memory

EN1640: Design of Computing Systems Topic 06: Memory System

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Mainstream Computer System Components

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Virtual to physical address translation

Main Memory (Fig. 7.13) Main Memory

Copyright 2012, Elsevier Inc. All rights reserved.

Background. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

1. Creates the illusion of an address space much larger than the physical memory

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

LECTURE 5: MEMORY HIERARCHY DESIGN

The University of Adelaide, School of Computer Science 13 September 2018

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Adapted from David Patterson s slides on graduate computer architecture

CSEE W4824 Computer Architecture Fall 2012

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Memory. Lecture 22 CS301

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Chapter Seven Morgan Kaufmann Publishers

ADDRESS TRANSLATION AND TLB

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

COMPUTER ORGANIZATION AND DESIGN

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

Virtual Memory: Mechanisms. CS439: Principles of Computer Systems February 28, 2018

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Introduction to cache memories

Handout 4 Memory Hierarchy

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

CSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COSC 6385 Computer Architecture - Memory Hierarchies (III)

ADDRESS TRANSLATION AND TLB

Advanced Memory Organizations

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Memory Technologies. Technology Trends

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

Processes and Virtual Memory Concepts

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Computer System Components

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Mainstream Computer System Components

Virtual Memory. Virtual Memory

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

This Unit: Main Memory. Virtual Memory. Virtual Memory. Other Uses of Virtual Memory

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE 351. Virtual Memory

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Lecture 18: DRAM Technologies

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Virtual Memory. Motivation:

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

Transcription:

Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James

Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions Review of project logistics Today s discussion: Main and Virtual Memory 2

Project Logistics Five major deadlines Project Demo Nov 23 Report due Dec 7 Project Timeline 3

CPU-DRAM gap * CPU (registers) Speed: 0.25 ns C a c h e Improve performance Memory bus 1 ns Memory (DRAM) 100 ns I/O bus I/O devices 5 ms 4 * 2003 Elsevier Science (USA). All rights reserved.

Main Memory Recall that on a cache miss, need to access memory. This process has three components Send the address Get the contents of the memory location Transfer contents Parameters for dynamic random access memory (DRAM) Access time: time between the read is requested and the desired word arrives Cycle time: minimum time between requests to memory cycle time > access time to allow address lines to stabilize and to write back data (i.e., reads are destructive) In 2002, cycle times were 80 ns (compared to 250ns in 1980) 5

DRAM Bits stored in single transistors Reads are destructive Transistors leak (i.e. Dynamic) so must be refreshed Address lines split into row and column addresses. A read consists of RAS row access strobe CAS column access strobe Access time is typically RAS + CAS, but periodic refreshing can stall requests 6

DRAM Structure 7

DRAM and SRAM Both are a memory organization using transistors S stands for static Traditionally uses 6 Ts/bit (some use 4 or less) No refresh No need to write after read Main memory uses DRAM; on-chip caches use SRAM; onboard caches have been built with both DRAM semiconductor fabs use specific processes to create chips Different layers, fewer layers Different physical properties 8

Improving Main Memory Sending address Can t really be improved Bandwidth Can use split-transaction bus to allow overlap Make memory wider Send one address, return more than one word Wider main memory increases memory increment size Complicates error correcting codes when writing one word in a multi-word memory 9

Interleaving Memory is organized in banks Bank i stores all words at address j mod I Requests to different banks can occur in consecutive clock cycles All banks can read a word in parallel Number of banks should match L2 block size Bus does not need to be wider Writes to individual banks for different addresses can proceed without waiting for the preceding write to finish Problem: number of banks limited by increasing chip capacity With 1M*1 bit chips, it takes 64*8=512 chips to get 64MB (easy to put 16 banks of 32 chips) With 64M*1 bit chips, it takes only 8 chips (only one bank) 10

Memory Width Illustration 11

Page-mode, Synchronous and Double Data Rate DRAMs Fast page mode: Add a row buffer to DRAM Keep most recently accessed row (i.e. page) open as a buffer Hit: read out of buffer, skip RAS Miss: cost of miss + RAS + CAS Synchronous: Add a clock Traditional DRAMs were asynchronous; adding a clock eliminates sync overhead at each request Also adds a width register to control number of words to be returned DDR: Transfer data on the rising and falling edge of clock Doubles data rate All three add a small amount of logic to exploit internal DRAM bandwidth (small cost, large improvement) 12

Virtual Memory Purpose: give program(mer) illusion of full-sized, unshared main memory Hardware assists for address translation and memory protection Largely invisible to programmer Prior to virtual memory Overlays Relocation registers Paging Segmentation Paging systems predate caches But same questions arise (mapping, replacement and write policies) An enormous difference: miss penalty 13

Paged Virtual Memory Illustrated 14

Two Extremes in the Memory Hierarchy Parameter L1 Cache Paging System Block (page) size 16-64 bytes 4K-8K bytes Miss (fault) time 10-100 cycles (20-1000 ns) Millions (3-20 ms) Miss (fault) rate 1-10% 0.00001-0.001% Memory size 4K-64K bytes Gigabytes 15

Other Extreme Differences Mapping: Restricted (L1) vs. general (Paging) Hardware assist for virtual address translation (TLB) Miss handler Hardware only for caches Software only for paging system (context switch) Hardware/Software for TLB Replacement algorithm Not important for caches Critical for paging systems Write policy Always write back for paging systems 16

Paging and Segmentation Paging: fixed block sizes Easy to map, translate, replace, transfer to/from disk Does not correspond to semantic objects, hence internal fragmentation Segmentation: variable block sizes Requires two addresses per word: segment and offset Difficult to translate (need to check segment length), place and replace (external fragmentation), transfer to/from disk Allocation can be problematic (need contiguous memory) Corresponds to semantic objects Segmentation with paging (segment holds N pages) Good for large objects Helps allocation problem 17

Page Table Illustration 18

Page Tables Page tables contain page table entries (PTEs): Physical page number, as well as valid, protection, dirty, and use bits virtual page number is often known implicitly based on the address, but can be stored explicitly as below Hardware register points to the page table of the running process Page table structures Traditional structure needs one entry per virtual page 32-bit addresses, 4KB pages gives 2 20 =1M entries Assuming 4 bytes per entry, size is 4MB Some systems have inverted page tables Goal: reduce size of page table One entry for each physical page Hash the virtual page number to form table index Size now due to physical page count, rather than virtual 512MB main memory needs a 1MB table, assuming an additional 4 bytes per entry to store the virtual address In all modern systems, page table entries are cached in a translation lookaside buffer (TLB) 19

TLBs Cache for page table entries (PTEs) Usually fully associative TLB miss handled by hardware or by software TLB miss 10-100 cycles, means you don t need to context switch Addressed in parallel with access to cache TLB is smaller than page table, admits faster access Can be on the processor s critical path For a given TLB size (number of entries) Larger page size implies larger effective capacity (ie, each TLB entry translates more addresses) Modern O.S.s have a page size parameter 20

Virtually Indexed, Physically Tagged Caches Does the cache use virtual or physical addresses? Why not all physical? Need to translate every time: serialize hierarchy Why not all virtual? Protection (page level protection is checked at translation time?) Context switches (each switch changes the meaning of virtual addresses) Can add PID to cache address tag Aliasing (OS and apps can use synonyms to refer to same locations) I/O uses physical addresses Idea: use page offset, which is the same for virtual and physical addresses, to index a small cache TLB entry (virtual page number) and Cache tag entry (portion of the page offset) are indexed in parallel Tags are physical addresses Limits the size of direct-mapped cache to a page! 21

Caches and Address Translation 22

I/O and Caches (a first look at cache coherence) Alternatives I/O data passes through the cache on its way from/to memory No coherency problem but contention for cache access I/O interacts directly with main memory Coherency problem 23

I/O Consistency Illustrated 24

I/O Consistency - Software Approach On Output: If write through cache, no problem If write back cache, purge the cache via O.S. interaction On Input: Both WT and WB: use a non-cacheable buffer space After input is completed, flush the cache of blocks mapping to these addresses and make the buffer cacheable. 25

I/O Consistency - Hardware Approach Subset of the shared-bus multiprocessor cache coherence protocol Cache controller snoops on the bus On output, if there is a match in tags: if WT nothing to do, else if WB send the cache entry instead On input, if there is a match in tags: store in both the cache and main memory (or invalidate the cache entry) 26

Assignment Project demos in < 2 weeks Readings For Monday No class meeting, meet in project groups 27