Chapter 8. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 8 <1>

Similar documents
Chapter 8. Chapter 8:: Topics. Introduction. Processor Memory Gap

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary


Chapter 3. SEQUENTIAL Logic Design. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris.

Chapter 8 Memory Hierarchy and Cache Memory

EN1640: Design of Computing Systems Topic 06: Memory System

Direct Mapped Cache Hardware. Direct Mapped Cache. Direct Mapped Cache Performance. Direct Mapped Cache Performance. Miss Rate = 3/15 = 20%

EN1640: Design of Computing Systems Topic 06: Memory System

Chapter 8 Virtual Memory

CSEE 3827: Fundamentals of Computer Systems, Spring Caches

Fundamentals of Computer Systems

Fundamentals of Computer Systems

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Memory Hierarchy Y. K. Malaiya

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS3350B Computer Architecture

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

COMPUTER ORGANIZATION AND DESIGN

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Advanced Memory Organizations

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture Memory hierarchies and caches

EE 4683/5683: COMPUTER ARCHITECTURE

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

The Memory Hierarchy 10/25/16

Memory Hierarchy. Mehran Rezaei

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

ECE232: Hardware Organization and Design

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

14:332:331. Week 13 Basics of Cache

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Review : Pipelining. Memory Hierarchy

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Memory Hierarchies &

Recap: Machine Organization

ECE 4750 Computer Architecture, Fall 2017 T03 Fundamental Memory Concepts

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Virtual Memory. CS 3410 Computer System Organization & Programming

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

CSE502: Computer Architecture CSE 502: Computer Architecture

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy: Caches, Virtual Memory

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Cray XE6 Performance Workshop

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Chapter Seven Morgan Kaufmann Publishers

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Virtual Memory. Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Memory Hierarchy and Caches

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Virtual Memory: From Address Translation to Demand Paging

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

The University of Adelaide, School of Computer Science 13 September 2018

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Virtual Memory - Objectives

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Introduction to OpenMP. Lecture 10: Caches

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

John Wawrzynek & Nick Weaver

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

The Memory Hierarchy & Cache

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

ECE468 Computer Organization and Architecture. Virtual Memory

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Background. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.

MEMORY. Objectives. L10 Memory

Transistor: Digital Building Blocks

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

ECE4680 Computer Organization and Architecture. Virtual Memory

Transcription:

Chapter 8 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 8 <1>

Chapter 8 :: Topics Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary Chapter 8 <2>

Introduction Computer performance depends on: Processor performance Memory system performance Memory Interface CLK CLK MemWrite WE Processor Address WriteData Memory ReadData Chapter 8 <3>

Processor-Memory Gap In prior chapters, assumed access memory in 1 clock cycle but hasn t been true since the 198 s Chapter 8 <4>

Memory System Challenge Make memory system appear as fast as processor Use hierarchy of memories Ideal memory: Fast Cheap (inexpensive) Large (capacity) But can only choose two! Chapter 8 <5>

Memory Hierarchy Technology Price / GB Access Time (ns) Bandwidth (GB/s) Cache SRAM $1, 1 25+ Speed Main Memory DRAM $1 1-5 1 SSD $1 1,.5 Virtual Memory HDD $.1 1,,.1 Capacity Chapter 8 <6>

Locality Exploit locality to make memory accesses fast Temporal Locality: Locality in time If data used recently, likely to use it again soon How to exploit: keep recently accessed data in higher levels of memory hierarchy Spatial Locality: Locality in space If data used recently, likely to use nearby data soon How to exploit: when access data, bring nearby data into higher levels of memory hierarchy too Chapter 8 <7>

Memory Performance Hit: data found in that level of memory hierarchy Miss: data not found (must go to next level) Hit Rate Miss Rate = # hits / # memory accesses = 1 Miss Rate = # misses / # memory accesses = 1 Hit Rate Average memory access time (AMAT): average time for processor to access data AMAT = t cache + MR cache [t MM + MR MM (t VM )] Chapter 8 <8>

Memory Performance Example 1 A program has 2, loads and stores 1,25 of these data values in cache Rest supplied by other levels of memory hierarchy What are the hit and miss rates for the cache? Chapter 8 <9>

Memory Performance Example 1 A program has 2, loads and stores 1,25 of these data values in cache Rest supplied by other levels of memory hierarchy What are the hit and miss rates for the cache? Hit Rate = 125/2 =.625 Miss Rate = 75/2 =.375 = 1 Hit Rate Chapter 8 <1>

Memory Performance Example 2 Suppose processor has 2 levels of hierarchy: cache and main memory t cache = 1 cycle, t MM = 1 cycles What is the AMAT of the program from Example 1? Chapter 8 <11>

Memory Performance Example 2 Suppose processor has 2 levels of hierarchy: cache and main memory t cache = 1 cycle, t MM = 1 cycles What is the AMAT of the program from Example 1? AMAT = t cache + MR cache (t MM ) = [1 +.375(1)] cycles = 38.5 cycles Chapter 8 <12>

Gene Amdahl, 1922- Amdahl s Law: the effort spent increasing the performance of a subsystem is wasted unless the subsystem affects a large percentage of overall performance Co-founded 3 companies, including one called Amdahl Corporation in 197 Chapter 8 <13>

Cache Highest level in memory hierarchy Fast (typically ~ 1 cycle access time) Ideally supplies most data to processor Usually holds most recently accessed data Chapter 8 <14>

Cache Design Questions What data is held in the cache? How is data found? What data is replaced? Focus on data loads, but stores follow same principles Chapter 8 <15>

What data is held in the cache? Ideally, cache anticipates needed data and puts it in cache But impossible to predict future Use past to predict future temporal and spatial locality: Temporal locality: copy newly accessed data into cache Spatial locality: copy neighboring data into cache too Chapter 8 <16>

Cache Terminology Capacity (C): number of data bytes in cache Block size (b): bytes of data brought into cache at once Number of blocks (B = C/b): number of blocks in cache: B = C/b Degree of associativity (N): number of blocks in a set Number of sets (S = B/N): each memory address maps to exactly one cache set Chapter 8 <17>

How is data found? Cache organized into S sets Each memory address maps to exactly one set Caches categorized by # of blocks in a set: Direct mapped: 1 block per set N-way set associative: N blocks per set Fully associative: all cache blocks in 1 set Examine each organization for a cache with: Capacity (C = 8 words) Block size (b = 1 word) So, number of blocks (B = 8) Chapter 8 <18>

Example Cache Parameters C = 8 words (capacity) b = 1 word (block size) So, B = 8 (# of blocks) Ridiculously small, but will illustrate organizations Chapter 8 <19>

Direct Mapped Cache Address 11...111111 11...11111 11...11111 11...1111 11...11111 11...1111 11...1111 11...111 mem[xff...fc] mem[xff...f8] mem[xff...f4] mem[xff...f] mem[xff...ec] mem[xff...e8] mem[xff...e4] mem[xff...e]...11...1...111...11...11...1...11...1...1... mem[x...24] mem[x..2] mem[x..1c] mem[x...18] mem[x...14] mem[x...1] mem[x...c] mem[x...8] mem[x...4] mem[x...] 2 3 Word Main Memory 2 3 Word Cache Chapter 8 <2> Set Number 7 (111) 6 (11) 5 (11) 4 (1) 3 (11) 2 (1) 1 (1) ()

Direct Mapped Cache Hardware Memory Address Tag Set 27 3 Byte Offset V Tag Data 8-entry x (1+27+32)-bit SRAM = 27 32 Hit Data Chapter 8 <21>

Direct Mapped Cache Performance # MIPS assembly code loop: beq done: addi $t, $, 5 lw lw lw $t, $, done $t1, x4($) $t2, xc($) $t3, x8($) addi $t, $t, -1 j loop Memory Address Tag... Set 1 3 Byte Offset V Tag Data 1... mem[x...c] 1... mem[x...8] 1... mem[x...4] Miss Rate =? Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Chapter 8 <22>

Direct Mapped Cache Performance # MIPS assembly code loop: beq done: addi $t, $, 5 lw lw lw $t, $, done $t1, x4($) $t2, xc($) $t3, x8($) addi $t, $t, -1 j loop Memory Address Tag... Set 1 3 Byte Offset V Tag Data 1... mem[x...c] 1... mem[x...8] 1... mem[x...4] Miss Rate = 3/15 Chapter 8 <23> = 2% Temporal Locality Compulsory Misses Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set ()

Direct Mapped Cache: Conflict # MIPS assembly code loop: beq done: addi $t, $, 5 lw lw $t, $, done $t1, x4($) $t2, x24($) addi $t, $t, -1 j Memory Address loop Tag...1 Set 1 3 Byte Offset V 1 Tag... Miss Rate =? Data mem[x...4] mem[x...24] Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set () Chapter 8 <24>

Direct Mapped Cache: Conflict # MIPS assembly code loop: beq done: addi $t, $, 5 lw lw $t, $, done $t1, x4($) $t2, x24($) addi $t, $t, -1 j Memory Address loop Tag...1 Set 1 3 Byte Offset V 1 Tag... Data mem[x...4] mem[x...24] Miss Rate = 1/1 = 1% Conflict Misses Chapter 8 <25> Set 7 (111) Set 6 (11) Set 5 (11) Set 4 (1) Set 3 (11) Set 2 (1) Set 1 (1) Set ()

1 N-Way Set Associative Cache Memory Address Tag Set 28 2 Byte Offset V Tag Way 1 Way Data V Tag Data 28 32 28 32 = = Hit 1 Hit Hit 1 32 Hit Data Chapter 8 <26>

N-Way Set Associative Performance # MIPS assembly code addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, x24($) addi $t, $t, -1 j loop done: Way 1 Way V Tag Data V Tag Data Miss Rate =? Set 3 Set 2 Set 1 Set Chapter 8 <27>

N-Way Set Associative Performance # MIPS assembly code loop: beq done: addi $t, $, 5 lw lw $t, $, done $t1, x4($) $t2, x24($) addi $t, $t, -1 j V Tag loop Way 1 Way Data V Tag Data 1...1 mem[x...24] 1... mem[x...4] Miss Rate = 2/1 = 2% Associativity reduces conflict misses Set 3 Set 2 Set 1 Set Chapter 8 <28>

Fully Associative Cache V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data Reduces conflict misses Expensive to build Chapter 8 <29>

1 1 11 Spatial Locality? Increase block size: Block size, b = 4 words C = 8 words Direct mapped (1 block per set) Number of blocks, B = 2 (C/b = 8/4 = 2) Memory Address Tag Block Byte Set Offset Offset 27 2 V Tag 27 Data 32 32 32 32 Set 1 Set = 32 Hit Data Chapter 8 <3>

1 1 11 Cache with Larger Block Size Memory Address Block Byte Tag Set Offset Offset 27 2 V Tag 27 Data 32 32 32 32 Set 1 Set = 32 Hit Data Chapter 8 <31>

Direct Mapped Cache Performance addi $t, $, 5 loop: beq $t, $, done lw $t1, x4($) lw $t2, xc($) lw $t3, x8($) addi $t, $t, -1 j loop done: Miss Rate =? Chapter 8 <32>

1 1 11 Direct Mapped Cache Performance addi $t, $, 5 loop: beq $t, $, done done: lw lw lw $t1, x4($) $t2, xc($) $t3, x8($) addi $t, $t, -1 j Memory Address Tag loop Block Byte Set Offset Offset... 11 27 2 V Tag 1... mem[x...c] 27 Miss Rate = 1/15 Larger blocks Data = 6.67% reduce compulsory misses through spatial locality mem[x...8] mem[x...4] mem[x...] 32 32 32 32 Set 1 Set = 32 Hit Data Chapter 8 <33>

Cache Organization Recap Capacity: C Block size: b Number of blocks in cache: B = C/b Number of blocks in a set: N Number of sets: S = B/N Organization Number of Ways (N) Direct Mapped 1 B Number of Sets (S = B/N) N-Way Set Associative 1 < N < B B / N Fully Associative B 1 Chapter 8 <34>

Capacity Misses Cache is too small to hold all data of interest at once If cache full: program accesses data X & evicts data Y Capacity miss when access Y again How to choose Y to minimize chance of needing it again? Least recently used (LRU) replacement: the least recently used block in a set evicted Chapter 8 <35>

Types of Misses Compulsory: first time data accessed Capacity: cache too small to hold all data of interest Conflict: data of interest maps to same location in cache Miss penalty: time it takes to retrieve a block from lower level of hierarchy Chapter 8 <36>

LRU Replacement # MIPS assembly lw $t, x4($) lw $t1, x24($) lw $t2, x54($) Way 1 Way V U Tag Data V Tag Data Set 3 (11) Set 2 (1) Set 1 (1) Set () Chapter 8 <37>

LRU Replacement # MIPS assembly lw $t, x4($) lw $t1, x24($) lw $t2, x54($) Way 1 Way (a) V U Tag Data V Tag Data 1...1 mem[x...24] 1... mem[x...4] Way 1 Way Set 3 (11) Set 2 (1) Set 1 (1) Set () (b) V U Tag Data V Tag 1 1...1 mem[x...24] 1...11 Data mem[x...54] Chapter 8 <38> Set 3 (11) Set 2 (1) Set 1 (1) Set ()

Cache Summary What data is held in the cache? Recently used data (temporal locality) Nearby data (spatial locality) How is data found? Set is determined by address of data Word within block also determined by address In associative caches, data could be in one of several ways What data is replaced? Least-recently used way in the set Chapter 8 <39>

Miss Rate Trends Bigger caches reduce capacity misses Greater associativity reduces conflict misses Adapted from Patterson & Hennessy, Computer Architecture: A Quantitative Approach, 211 Chapter 8 <4>

Miss Rate Trends Bigger blocks reduce compulsory misses Bigger blocks increase conflict misses Chapter 8 <41>

Multilevel Caches Larger caches have lower miss rates, longer access times Expand memory hierarchy to multiple levels of caches Level 1: small and fast (e.g. 16 KB, 1 cycle) Level 2: larger and slower (e.g. 256 KB, 2-6 cycles) Most modern PCs have L1, L2, and L3 cache Chapter 8 <42>

Intel Pentium III Die Chapter 8 <43>

Virtual Memory Gives the illusion of bigger memory Main memory (DRAM) acts as cache for hard disk Chapter 8 <44>

Memory Hierarchy Technology Price / GB Access Time (ns) Bandwidth (GB/s) Cache SRAM $1, 1 25+ Speed Main Memory DRAM $1 1-5 1 SSD $1 1,.5 Virtual Memory HDD $.1 1,,.1 Capacity Physical Memory: DRAM (Main Memory) Virtual Memory: Hard drive Slow, Large, Cheap Chapter 8 <45>

Hard Disk Magnetic Disks Read/Write Head Takes milliseconds to seek correct location on disk Chapter 8 <46>

Virtual Memory Virtual addresses Programs use virtual addresses Entire virtual address space stored on a hard drive Subset of virtual address data in DRAM CPU translates virtual addresses into physical addresses (DRAM addresses) Data not in DRAM fetched from hard drive Memory Protection Each program has own virtual to physical mapping Two programs can use same virtual address for different data Programs don t need to be aware others are running One program (or virus) can t corrupt memory used by another Chapter 8 <47>

Cache/Virtual Memory Analogues Cache Block Block Size Block Offset Miss Tag Virtual Memory Page Page Size Page Offset Page Fault Virtual Page Number Physical memory acts as cache for virtual memory Chapter 8 <48>

Virtual Memory Definitions Page size: amount of memory transferred from hard disk to DRAM at once Address translation: determining physical address from virtual address Page table: lookup table used to translate virtual addresses to physical addresses Chapter 8 <49>

Virtual & Physical Addresses Most accesses hit in physical memory But programs have the large capacity of virtual memory Chapter 8 <5>

Address Translation Chapter 8 <51>

Virtual Memory Example System: Virtual memory size: 2 GB = 2 31 bytes Physical memory size: 128 MB = 2 27 bytes Page size: 4 KB = 2 12 bytes Chapter 8 <52>

Virtual Memory Example System: Virtual memory size: 2 GB = 2 31 bytes Physical memory size: 128 MB = 2 27 bytes Page size: 4 KB = 2 12 bytes Organization: Virtual address: 31 bits Physical address: 27 bits Page offset: 12 bits # Virtual pages = 2 31 /2 12 = 2 19 (VPN = 19 bits) # Physical pages = 2 27 /2 12 = 2 15 (PPN = 15 bits) Chapter 8 <53>

Virtual Memory Example 19-bit virtual page numbers 15-bit physical page numbers Chapter 8 <54>

Virtual Memory Example What is the physical address of virtual address x247c? Chapter 8 <55>

Virtual Memory Example What is the physical address of virtual address x247c? VPN = x2 VPN x2 maps to PPN x7fff 12-bit page offset: x47c Physical address = x7fff47c Chapter 8 <56>

How to perform translation? Page table Entry for each virtual page Entry fields: Valid bit: 1 if page in physical memory Physical page number: where the page is located Chapter 8 <57>

Page Table Example Virtual Address Virtual Page Number x2 19 Page Offset 47C 12 VPN is index into page table V 1 x 1 x7ffe 1 x1 1 x7fff Hit Physical Address Physical Page Number x7fff Page Table 15 12 47C Chapter 8 <58>

Page Table Example 1 What is the physical address of virtual address x5f2? V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Page Table Chapter 8 <59>

Page Table Example 1 What is the physical address of virtual address x5f2? VPN = 5 Entry 5 in page table VPN 5 => physical page 1 Physical address: x1f2 Virtual Address Virtual Page Number x5 19 Page Offset F2 12 V 1 x 1 x7ffe 1 x1 1 x7fff Hit Physical Address Physical Page Number Chapter 8 <6> Page Table 15 12 x1 F2

Page Table Example 2 What is the physical address of virtual address x73e? Virtual Address Virtual Page Number x7 19 Page Offset 3E V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Page Table Hit 15 Chapter 8 <61>

Page Table Example 2 What is the physical address of virtual address x73e? VPN = 7 Entry 7 is invalid Virtual page must be paged into physical memory from disk Virtual Address Virtual Page Number x7 19 Page Offset 3E V Physical Page Number 1 x 1 x7ffe 1 x1 1 x7fff Page Table Hit 15 Chapter 8 <62>

Page Table Challenges Page table is large usually located in physical memory Load/store requires 2 main memory accesses: one for translation (page table read) one to access data (after translation) Cuts memory performance in half Unless we get clever Chapter 8 <63>

Translation Lookaside Buffer (TLB) Small cache of most recent translations Reduces # of memory accesses for most loads/stores from 2 to 1 Chapter 8 <64>

TLB Page table accesses: high temporal locality Large page size, so consecutive loads/stores likely to access same page TLB Small: accessed in < 1 cycle Typically 16-512 entries Fully associative > 99 % hit rates typical Reduces # of memory accesses for most loads/stores from 2 to 1 Chapter 8 <65>

1 Example 2-Entry TLB Virtual Address Virtual Page Number x2 19 Page Offset 47C 12 Entry 1 Entry V Virtual Page Number Physical Page Number 1 x7fffd x 1 x2 x7fff V Virtual Page Number Physical Page Number 19 15 19 15 TLB = = Hit 1 Hit Hit 1 Hit Physical Address 15 x7fff 47C 12 Chapter 8 <66>

Memory Protection Multiple processes (programs) run at once Each process has its own page table Each process can use entire virtual address space A process can only access physical pages mapped in its own page table Chapter 8 <67>

Virtual Memory Summary Virtual memory increases capacity A subset of virtual pages in physical memory Page table maps virtual pages to physical pages address translation A TLB speeds up address translation Different page tables for different programs provides memory protection Chapter 8 <68>

Memory-Mapped I/O Processor accesses I/O devices just like memory (like keyboards, monitors, printers) Each I/O device assigned one or more address When that address is detected, data read/written to I/O device instead of memory A portion of the address space dedicated to I/O devices Chapter 8 <69>

Memory-Mapped I/O Hardware Address Decoder: Looks at address to determine which device/memory communicates with the processor I/O Registers: Hold values written to the I/O devices ReadData Multiplexer: Selects between memory and I/O devices as source of data sent to the processor Chapter 8 <7>

The Memory Interface CLK Processor MemWrite Address WriteData WE Memory ReadData Chapter 8 <71>

WEM WE1 WE2 Memory-Mapped I/O Hardware Address Decoder CLK MemWrite WE CLK RDsel 1: Processor Address WriteData Memory CLK EN I/O Device 1 1 1 ReadData EN I/O Device 2 Chapter 8 <72>

Memory-Mapped I/O Code Suppose I/O Device 1 is assigned the address xfffffff4 Write the value 42 to I/O Device 1 Read value from I/O Device 1 and place in $t3 Chapter 8 <73>

WEM WE1 = 1 WE2 Memory-Mapped I/O Code Write the value 42 to I/O Device 1 (xfffffff4) addi $t, $, 42 sw $t, xfff4($) Address Decoder CLK MemWrite WE CLK RDsel 1: Processor Address WriteData Memory CLK EN I/O Device 1 1 1 ReadData EN I/O Device 2 Chapter 8 <74>

RDsel 1: = 1 WEM WE1 WE2 Memory-Mapped I/O Code Read the value from I/O Device 1 and place in $t3 lw $t3, xfff4($) Address Decoder CLK Processor MemWrite Address WriteData WE CLK Memory CLK EN I/O Device 1 1 1 ReadData EN I/O Device 2 Chapter 8 <75>

Input/Output (I/O) Systems Embedded I/O Systems Toasters, LEDs, etc. PC I/O Systems Chapter 8 <76>

Embedded I/O Systems Example microcontroller: PIC32 microcontroller 32-bit MIPS processor low-level peripherals include: serial ports timers A/D converters Chapter 8 <77>

Digital I/O // C Code #include <p3xxxx.h> int main(void) { int switches; TRISD = xff; // RD[7:] outputs // RD[11:8] inputs while (1) { // read & mask switches, RD[11:8] switches = (PORTD >> 8) & xf; PORTD = switches; // display on LEDs } } Chapter 8 <78>

Serial I/O Example serial protocols SPI: Serial Peripheral Interface UART: Universal Asynchronous Receiver/Transmitter Also: I 2 C, USB, Ethernet, etc. Chapter 8 <79>

SPI: Serial Peripheral Interface Master initiates communication to slave by sending pulses on SCK Master sends SDO (Serial Data Out) to slave, msb first Slave may send data (SDI) to master, msb first Chapter 8 <8>

UART: Universal Asynchronous Rx/Tx Configuration: start bit (), 7-8 data bits, parity bit (optional), 1+ stop bits (1) data rate: 3, 12, 24, 96, 1152 baud Line idles HIGH (1) Common configuration: 8 data bits, no parity, 1 stop bit, 96 baud Chapter 8 <81>

Timers // Create specified ms/us of delay using built-in timer #include <P32xxxx.h> void delaymicros(int micros) { if (micros > 1) { // avoid timer overflow delaymicros(1); delaymicros(micros-1); } else if (micros > 6){ TMR1 = ; // reset timer to T1CONbits.ON = 1; // turn timer on PR1 = (micros-6)*2; // 2 clocks per microsecond // Function has overhead of ~6 us IFSbits.T1IF = ; // clear overflow flag while (!IFSbits.T1IF); // wait until overflow flag set } } void delaymillis(int millis) { while (millis--) delaymicros(1); // repeatedly delay 1 ms } // until done Chapter 8 <82>

Analog I/O Needed to interface with outside world Analog input: Analog-to-digital (A/D) conversion Often included in microcontroller N-bit: converts analog input from V ref- -V ref+ to -2 N-1 Analog output: Digital-to-analog (D/A) conversion Typically need external chip (e.g., AD558 or LTC1257) N-bit: converts digital signal from -2 N-1 to V ref- -V ref+ Pulse-width modulation Chapter 8 <83>

Pulse-Width Modulation (PWM) Average value proportional to duty cycle Add high-pass filter on output to deliver average value Chapter 8 <84>

Other Microcontroller Peripherals Examples Character LCD VGA monitor Bluetooth wireless Motors Chapter 8 <85>

Personal Computer (PC) I/O Systems USB: Universal Serial Bus USB 1. released in 1996 standardized cables/software for peripherals PCI/PCIe: Peripheral Component Interconnect/PCI Express developed by Intel, widespread around 1994 32-bit parallel bus used for expansion cards (i.e., sound cards, video cards, etc.) DDR: double-data rate memory Chapter 8 <86>

Personal Computer (PC) I/O Systems TCP/IP: Transmission Control Protocol and Internet Protocol physical connection: Ethernet cable or Wi-Fi SATA: hard drive interface Input/Output (sensors, actuators, microcontrollers, etc.) Data Acquisition Systems (DAQs) USB Links Chapter 8 <87>