Question 13 1: (Solution, p 4) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate.

Similar documents
CPE300: Digital System Architecture and Design

Virtual to physical address translation

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

CS 136: Advanced Architecture. Review of Caches

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

211: Computer Architecture Summer 2016

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

1. Creates the illusion of an address space much larger than the physical memory

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Structure of Computer Systems

CS161 Design and Architecture of Computer Systems. Cache $$$$$

ECE468 Computer Organization and Architecture. Virtual Memory

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

ECE4680 Computer Organization and Architecture. Virtual Memory

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Chapter 8. Virtual Memory

Princeton University. Computer Science 217: Introduction to Programming Systems. The Memory/Storage Hierarchy and Virtual Memory

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Last class. Caches. Direct mapped

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Memory. Objectives. Introduction. 6.2 Types of Memory

ADDRESS TRANSLATION AND TLB

ECEC 355: Cache Design

Cache introduction. April 16, Howard Huang 1

ADDRESS TRANSLATION AND TLB

Pipelined processors and Hazards

a process may be swapped in and out of main memory such that it occupies different regions

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses.

Storage Management 1

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Virtual Memory 1. Virtual Memory

Virtual Memory 1. Virtual Memory

Virtual Memory II CSE 351 Spring

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

Address spaces and memory management

Memory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

Virtual Memory. Virtual Memory Overview. Ken Wong Washington University. Demand Paging. Virtual Memory - Part 1 (CSE 422S)

Chapter 5 B. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster,

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

ADMIN. SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) Down the home stretch. Split Caches. Final Exam Monday May 1 (first exam day)

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

(Refer Slide Time: 01:25)

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

CS3350B Computer Architecture

Cache memories are small, fast SRAM based memories managed automatically in hardware.

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Agenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories

Virtual Memory. Chapter 8

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

Today Cache memory organization and operation Performance impact of caches

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

Chapter 8 Virtual Memory

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

CPE300: Digital System Architecture and Design

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Part II Virtual Memory

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

Virtual Memory: From Address Translation to Demand Paging

Question Points Score Total 100

CPSC 352. Chapter 7: Memory. Computer Organization. Principles of Computer Architecture by M. Murdocca and V. Heuring

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Chapter 3 - Memory Management

15 Sharing Main Memory Segmentation and Paging

Memory Management. 3. What two registers can be used to provide a simple form of memory protection? Base register Limit Register

Chapter 3 Memory Management: Virtual Memory

Today. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster,

V. Primary & Secondary Memory!

Chapter 8 Virtual Memory

16 Sharing Main Memory Segmentation and Paging

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

CS399 New Beginnings. Jonathan Walpole

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Memory Management Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Cache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons

Virtual Memory. control structures and hardware support

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Operating Systems Lecture 6: Memory Management II

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 2)

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

ECE 341. Lecture # 18

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

Virtual Memory. Virtual Memory

MEMORY MANAGEMENT/1 CS 409, FALL 2013

Transcription:

Questions 1 Question 13 1: (Solution, p ) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate. Question 13 : (Solution, p ) In implementing HYMN s control unit, the fetch cycle and the execute cycle each consisted of two clock phases: one phase in which the clock was at 0, and one phase in which the clock was at 1. Explain why having two parts to each cycle was important in designing the control unit. In other words, why should the control unit treat the two phases of each cycle differently? Question 1. 1: (Solution, p ) In terms of their physical characteristics, what distinguishes an L1 cache from an L cache? Question 1. : (Solution, p ) Suppose we have a system using six-bit addresses which uses a directmapped cache with two lines, where each line has two bytes. And suppose the following sequence of accesses of one-byte accesses: M[10], M[], M[1], M[3], M[10], M[]. a. Which of the accesses in the sequence hit the cache? b. Draw a picture of the contents of the cache after completing this sequence. Question 1. 3: (Solution, p ) Compare the virtues of a direct-mapped cache versus a fully associative cache. Question 1.3 1: (Solution, p ) Consider the following C code to add the numbers within an array. int i; int sum = 0; for(i = 0; i < n; i++) { sum += a[i]; Note that this program never accesses the same array element twice. Explain how a direct-mapped memory cache helps this code run faster. Question 1.3 : (Solution, p ) Consider the following C code to multiply two matrices. for(int i = 0; i < n; i++) { for(int j = 0; j < n; j++) { for(int k = 0; k < n; k++) { C[i][j] += A[i][k] * B[k][j]; Note that the innermost loop here iterates over k. Suppose that our CPU has a direct-mapped cache with 18 lines, each of 6 bytes, and suppose that each array element takes 8 bytes. On average, how many times per iteration through the innermost loop will the CPU miss the cache in accessing each of the following? a. C[i][j] b. A[i][k] c. B[k][j]

Questions Question 1.3 3: (Solution, p ) Consider the following C code fragment. for(i = 0; i < n; i++) { j = a[i]; sum += count[j]; Suppose that the CPU uses a direct-mapped cache in which each line holds four array entries. For each iteration of this loop, how many times, on average, will the CPU miss the cache? Explain your answer. (Assume that the array a holds random integers between 0 and 1,000,000.) Question 9.1 1: (Solution, p 5) Define the relocation problem that arises in the context of segments. Question 9.1 : (Solution, p 5) Suppose we have an operating system that implements load-time relocation to place the program in available memory. Describe how the operating system starts a program in detail. Question 9.1 3: (Solution, p 5) Explain how a CPU incorporating segment registers computes a memory address. Question 9.1 : (Solution, p 5) In class we saw three different ways of enabling a process to run in memory, including the load-time technique, when the process s memory references are repaired at the time the process is loaded into memory, and the run-time technique, where the process s memory references are repaired by the CPU at the time the memory is referenced. Name and explain one advantage of the run-time technique over the load-time technique. Question 9. 1: (Solution, p 6) a. Explain how a CPU supporting virtual memory would translate a virtual memory address into the actual memory address, assuming that the requested page is in memory. b. Explain what the CPU does when the requested page is not in memory. Question 9. : (Solution, p 6) Describe at least three elements of a typical page table entry. Question 9. 3: (Solution, p 6) Suppose we were using a virtual memory system with three s, using the FIFO algorithm to decide which frames to include in memory. At right, give the page located in each frame after each of the given page references. 5 1

Questions 3 Question 9. : (Solution, p 6) Suppose we were using a virtual memory system with three s, using the Clock algorithm to decide which frames to include in memory. Give the page located in each frame after each of the given page references. 5 6 Question 9. 5: (Solution, p 7) Explain the problem that leads to including the TLB in computing systems supporting virtual memory. Question 9. 6: (Solution, p 7) Explain the concept and purpose of page directories. Question 9. 7: (Solution, p 7) What advantages does a virtual memory system provide? Give at least two reasons.

Solutions Solution 13 1: (Question, p 1) A demultiplexer has three inputs, including a data input and two select inputs representing the two bits of a number. It has four outputs, number 00 through 11. The demultiplexer routes its data input to the output whose number is ; the other outputs values will be 0. Solution 13 : (Question, p 1) The distinction between the two phases is important because of timing considerations. During the 0 phase of each cycle, the computer computes the values for the cycle; this is a matter of letting values propagate through the circuit to the input pins of the registers and memory where the values should be stored. During the 1 phase, these values continue to be propagated through to the registers and memory that should change, while the computer sends signals to the registers and memory that they should change. Solution 1. 1: (Question, p 1) The L1 cache is incorporated into the CPU chip itself, while the L cache is on a separate chip. (A secondary distinguishing factor is the L cache s larger size.) Solution 1. : (Question, p 1) a. the fourth (M[3]) access only. b. line num tag line data 0 0000 m[ 0] m[ 1] 1 0000 m[ ] m[ 3] Solution 1. 3: (Question, p 1) With a direct-mapped cache, in which every memory address can be cached in only one location within the cache, there is the possibility that an access to a distant memory address would eject a very recent access that will likely be used soon in the future. A fully associative cache doesn t share this occassional poor behavior, but it requires more logic per cache line, so that large and fast fully associative cache are unaffordable. Solution 1.3 1: (Question, p 1) This code goes through the array in order, and several array elements will map to a single line in the cache. When the code misses the cache, the cache will load the entire line containing the desired array entry, including the next several adjacent memory locations. Thus, the next several iterations of the loop would hit the cache, saving a memory access each of these times. Solution 1.3 : (Question, p 1) a. 0 (This is the same memory location each time through the innermost loop, so there will be no misses after the first iteration.) b. 0.15 (As we go through the innermost loop, we are walking through successive memory locations. Each time we miss the cache, we will load eight successive locations in the row; thus this access misses the cache one time in eight.) c. 1.0 (Each array access in in a different row, so they will be widely separated. Loading one element in the column will not help in reaching another. Thus each access to B will be a miss.) Solution 1.3 3: (Question, p ) 1.5. This loop goes through the a array in succession, and so when we load one element of a into the cache, the CPU will load the next three elements into the cache also. Thus, this code misses the cache when accessing a[i] an average of 0.5 each iteration. The pattern of accesses to count, however, is random, and it is highly unlikely that any access would be to an element already in the cache. Thus, the CPU is likely to miss the cache in accessing count[j] each time through the loop. We add these two numbers together (0.5 + 1.00) to get the total number of misses.

Solutions 5 Solution 9.1 1: (Question, p ) In an operating system supporting segments, the operating system chooses which part of memory the program will occupy when the program starts. When a program is loaded into memory, the program must contain references to specific memory addresses within the program s region of memory, but the executable file cannot specify these addresses directly because the compiler generates it without knowing where the operating system may choose to place the program. (Indeed, the operating system may be executing the same program in different regions of memory at the same time.) The relocation problem refers to the problem of adapting these memory addresses to the region of memory chosen by the operating system for the loaded program. Solution 9.1 : (Question, p ) First the OS allocates a segment of memory to the program, sufficient to hold all the code and data of the program. Then it reads the memory image from the executable file, loading it into memory. This memory image, however, has all memory references placed assuming that the segment begins at address 0; thus, the OS must fix this by reading the offset of each memory address reference from the executable file and adding the segment s beginning address into the number at that offset in the loaded memory image. This repairs all the memory references to refer to actual memory address within the process s segment. Solution 9.1 3: (Question, p ) For each memory reference, the CPU adds the contents of the segment register to the requested address to get the actual address of the data. This actual address is what is sent to memory to fetch data. (The idea is that memory references will be offset by as the instruction is executing, instead of doing it at the time that the instruction is placed in memory.) Solution 9.1 : (Question, p ) There are several answers to this question. The run-time technique meshes nicely with restricting a process s memory accesses to that process s segment. This memory protection must be accomplished in hardware, because the performance hit if the OS had to check each memory access would be prohibitive. Essentially this circuitry already involves a segment register telling where the current process s memory begins. Thus, given that memory protection is a necessity in a modern computer, the run-time technique is not as complicated as the load-time technique. In the load-time technique, once the program is loaded, it cannot be moved to another location in memory, because the program may create references to other memory locations within the process segment, and these references are indistinguishible from other numbers. With the run-time technique, such movement is possible, because the only thing that would need to change with the program is the contents of the segment register. The load-time technique can significantly slow the time it takes to start a program, because the OS must repair all the links after it has loaded the image into memory. In contrast, the run-time technique can be efficiently implemented in hardware, so that there is no penalty during run-time for that technique. The load-time technique can make the executable file slightly larger, because it requires an additional section of the executable file enumerating the address references within the memory image. (This is a very minor sadvantage.)

6 Solutions Solution 9. 1: (Question, p ) a. It splits the virtual memory address into two parts, a page index and an offset. The CPU looks into the page table, stored in RAM, to get the page table entry for the specified page index. This page table entry tells which contains the page, and the CPU looks into this at the offset to access the requested data. b. It raises a page fault, which transfers control to the operating system s exception handler. The OS will then presumably load the requested page into some and return from the exception handler. On return, the CPU will try the memory access again. Solution 9. : (Question, p ) Here are four: The valid bit, saying whether the page is currently in memory. The location of the page in memory. The dirty bit, saying whether the page in memory has been changed. The referenced bit, saying whether the page in memory has been accessed recently. Solution 9. 3: (Question, p ) 3 3 5 5 3 5 3 5 5 Solution 9. : (Question, p 3) 3 3 5 5 5 5 6 6 5

Solutions 7 Solution 9. 5: (Question, p 3) A straightforward implementation of virtual memory places the page table in memory. Unfortunately, this means that each attempt to access a memory location actually requires two memory accesses one to look up the page table entry in virtual memory, and one to access the requested memory within the given by that entry. This effectively halves the time to access each page, when compared to a system that does not use virtual memory. Designers reduce this problem dramatically by caching a small number of frequently-used page table entries on the CPU chip, in a portion of the chip called the translation lookaside buffer, or TLB. Since accessing data stored on the CPU chip is much faster than accessing data in memory, the TLB can dramatically reduce the time for looking up page table entries, restoring a access speed more comparable to a system without virtual memory. Solution 9. 6: (Question, p 3) A page directory is a table of pointers to page tables for regions of virtual memory. A CPU uses the page directory instead of a single page table, because a single page table can become very large and it must be in memory in order to allow for address translation. With a page directory, the partial page tables can be paged in virtual memory. (The page directory must occupy RAM for address translation, but the page directory would be much smaller than a single exhaustive page table.) Moreover, page tables for unused regions of memory need not occupy even virtual memory. Solution 9. 7: (Question, p 3) It provides a larger range of memory address beyond what actual memory the computer holds. It allows each process to have its own address space. It enables two processes to share the same memory easily. It enables memory to be moved between addresses rapidly, since altering the paging table moves the page without any actual memory copying going on.