CS 465 Final Review. Fall 2017 Prof. Daniel Menasce

Similar documents
ECE 2300 Digital Logic & Computer Organization. More Caches Measuring Performance

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Computer Architecture Spring 2016

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

CS 61C: Great Ideas in Computer Architecture Direct- Mapped Caches. Increasing distance from processor, decreasing speed.

Virtual memory. Hung-Wei Tseng

Computer Architecture CS372 Exam 3

Caching and Demand- Paged Virtual Memory

Modern Computer Architecture

Virtual memory. Hung-Wei Tseng

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

CS 61C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associa0vity. Smart Phone. Today s Lecture. Core.

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

ECE 331 Hardware Organization and Design. UMass ECE Discussion 10 4/5/2018

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CS3350B Computer Architecture

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Page 1. Memory Hierarchies (Part 2)

ECE331: Hardware Organization and Design

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

1. Creates the illusion of an address space much larger than the physical memory

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

UNIT V: CENTRAL PROCESSING UNIT

ECE 2300 Digital Logic & Computer Organization. Caches

LECTURE 3: THE PROCESSOR

Do not start the test until instructed to do so!

EE 4683/5683: COMPUTER ARCHITECTURE

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda

Memory Hierarchy: Motivation

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

14:332:331. Week 13 Basics of Cache

Pipelining. CSC Friday, November 6, 2015

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Computer Systems CSE 410 Autumn Memory Organiza:on and Caches

COSC 6385 Computer Architecture - Pipelining

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CMPT 300 Introduction to Operating Systems

Chapter 4. The Processor

Problem: Processor Memory BoJleneck

cs470 - Computer Architecture 1 Spring 2002 Final Exam open books, open notes

Processor (II) - pipelining. Hwansoo Han

V. Primary & Secondary Memory!

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

14:332:331. Week 13 Basics of Cache

Question 1: (20 points) For this question, refer to the following pipeline architecture.

Pipelined processors and Hazards

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

CPE 631 Lecture 04: CPU Caches

CMSC411 Fall 2013 Midterm 1

Full Datapath. Chapter 4 The Processor 2

COMPUTER ORGANIZATION AND DESIGN

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Final Exam Fall 2007

Cache Performance (H&P 5.3; 5.5; 5.6)

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Memory Hierarchy: The motivation

ECE468 Computer Organization and Architecture. Virtual Memory

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

ECE4680 Computer Organization and Architecture. Virtual Memory

ECE154A Introduction to Computer Architecture. Homework 4 solution

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Course Administration

EN1640: Design of Computing Systems Topic 06: Memory System

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1


LECTURE 12. Virtual Memory

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Chapter 8 Memory Hierarchy and Cache Memory

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Memory hierarchy review. ECE 154B Dmitri Strukov

CSSE232 Computer Architecture I. Datapath

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

CS61C Review of Cache/VM/TLB. Lecture 26. April 30, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

Why memory hierarchy

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS/CoE 1541 Mid Term Exam (Fall 2018).

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

This lecture. Virtual Memory. Virtual memory (VM) CS Instructor: Sanjeev Se(a

Transcription:

CS 465 Final Review Fall 2017 Prof. Daniel Menasce

Ques@ons What are the types of hazards in a datapath and how each of them can be mi@gated? State and explain some of the methods used to deal with branch predic@on.

Answer What are the types of hazards in a datapath and how each of them can be mi@gated? Data hazard (forwarding) Structural hazard (more func@onal units; e.g., separate instruc@on and data caches) Control hazard (branch predic@on, determining branch outcome at the ID stage). State and explain some of the methods used to deal with branch predic@on. Sta@c (assume taken/not taken) and dynamic (1-bit, 2- bit)

Does the following code sequence must stall, can avoid stalls using forwarding only, or can execute without stalling or forwarding? lw $t0, 0($t0) add $t1, $t0, $t0

Does the following code sequence must stall, can avoid stalls using forwarding only, or can execute without stalling or forwarding? lw $t0, 0($t0) add $t1, $t0, $t0 $t0 available must stall $t0 needed

Ques@on What is the purpose of the pipeline registers? What are the names given to these registers? Explain how informa@on stored in these registers changes along the pipeline How are these registers used to detect data hazards?

Answers What is the purpose of the pipeline registers? To store the informa@on needed by each instruc@on to execute each stage of the pipeline What are the names given to these registers? IF/ID, ID/EX, EX/MEM, and MEM/WB Explain how informa@on stored in these registers changes along the pipeline Informa@on from the control unit is first stored in the ID/EX register and is divided EX, Mem, WB. At each subsequent stage, Mem and WB are passed long and then only WB. How are these registers used to detect data hazards? By comparing the Rd with the Rt/Rs register numbers across pipeline registers.

Explain This

Ques@on Which of these elements influence the execu@on @me of a program and why? The level 1 cache hit rate The hit rate to the TLB The number of instruc@ons handled per stage The page fault rate The ISA The clock cycle dura@on

Answer ALL Which of these elements influence the execu@on @me of a program? The level 1 cache hit rate: reduces access to RAM The hit rate to the TLB: reduces access to Page Table, which is in RAM The number of instruc@ons handled per stage: increases the throughput The page fault rate: page faults require access to the paging disk The ISA: influences number of instruc@ons executed by a program, compiler design, and ISA implementa@on The clock cycle dura@on: propor@onal to CPU @me

Ques@on What are all the levels of a memory system hierarchy? How do size, access @me, and cost/bit vary along the hierarchy? What property of programs is important for the performance of the memory system hierarchy?

Answer What are all the levels of a memory system hierarchy? Registers, caches, main memory, secondary storage How do size, access @me, and cost/bit vary along the hierarchy? Size and access @me increase going down and cost/bit decreases going down What property of programs is important for the performance of the memory system hierarchy? Space and temporal locality

Ques@on A computer system takes 1,000 days to fail on average and the @me to repair is on average one day. What is its availability?

Answer A computer system takes 1,000 days to fail on average and the @me to repair is on average one day. What is its availability? Availability = MTTF / (MTTF + MTTR) = 1,000 / (1,000 + 1) = 0.999

Ques@on Explain the Principle of Locality and discuss where it is applied and what it achieves.

Answer Explain the Principle of Locality and discuss where it is applied and what it achieves. Spa@al locality: items close to one just accessed are more likely to be accessed Temporal locality: items accessed recently ar more likely to be accessed again soon. Use to provide in caching and virtual memory. Gives the illusion of a fast and large memory.

Ques@on What are the types of cache organiza@on? Explain for each: block search, block placement, block replacement? Explain the two cache hit on write policies: write-through and write back (include write buffer) Explain approaches used for cache miss on write policies

Answer What are the types of cache organiza@on? Direct mapped (DM), set associa@ve (SA), fully associa@ve (FA) Explain for each: block search, block placement, block replacement? DM: search one cache entry only; SA: search all entries in a set; FA: search all cache entries DM: at a specific cache entry; SA: any empty loca@on in a set; FA: any empty loca@on in the cache DM: only one op@on; SA: LRU/Random within set; FA: LRU/ Random within the cache Explain the two cache hit on write policies: write-through and write back (includes write buffer) Explain approaches used for cache miss on write policies Allocate on miss (fetch block and overwrite por@on of block) and write-around (do not fetch block)

Ques@on Consider that a physical address is 32 bits and that a direct mapped cache has 1024 entries. Each block in the cache has four 32-bit words. How many bits are used for the tag, cache index, and byte offset in the address?

Answer Consider that a physical address is 32 bits and that a direct mapped cache has 1024 entries. Each block in the cache has four 32-bit words. How many bits are used for the tag, cache index, and byte offset in the address? Cache index: 10 bits are needed to address 2^10 entries Byte offset: each block has 4 * 4 = 16 bytes. So, need 4 bits to address 2^4 byes. Tag: 32 (10+4) = 18 bits

Ques@on Explain the three sources of misses known as the 3 C s: compulsory, capacity, and conflict.

Answer Explain the three sources of misses known as the 3 C s: compulsory, capacity, and conflict. Compulsory: happens at the first reference to a block (cold start) Capacity: a previously cached entry is evicted from the cache to make room for another Conflict: various blocks can only be placed in one par@cular loca@on (entry or set).

Ques@on A direct mapped cache has 128 entries and each entry holds one block of 64 bits. What is the cache entry for a block that is stored in memory star@ng at byte 1152?

Answer A direct mapped cache has 128 entries and each entry holds one block of 64 bits. What is the cache entry for a block that is stored in memory star@ng at byte 1152? Size of a block = 64 bits / (8 bits/byte) = 8 bytes The number of block star@ng at byte 1152 is 1152 / 8 = 144. Cache index = 144 mod 128 = 16

Ques@on A 4-way set associa@ve cache has 64 sets and each entry holds four blocks of 64 bits. What is the set entry for a block that is stored in memory star@ng at byte 1152?

Answer A 4-way set associa@ve cache has 64 sets and each entry holds four blocks of 64 bits. What is the set entry for a block that is stored in memory star@ng at byte 1152? Number of the block star@ng at byte 1152 = 1152 /8 = 144. Set number where block 144 should be = 144 mod 64 = 16

Ques@on Which of the following are true? A. All cores share a single L1 cache B. All cores share a single L2 cache C. Each core has its own page table register D. All cores share a single L3 cache E. The TLB is stored in main memory F. The page table needs to be accessed for every memory access

Answer: C and D Which of the following are true? A. All cores share a single L1 cache B. All cores share a single L2 cache C. Each core has its own page table register D. All cores share a single L3 cache E. The TLB is stored in main memory F. The page table needs to be accessed for every memory access

Explain This

Explain This

Explain This

Explain This

Ques@on Consider a 4-way set associa@ve cache with blocks of 4 words of 4 bytes each. The cache size is 256 K bytes. How many bits in the address are used for the tag, cache index, and block/word/ byte?

Answer Consider a 4-way set associa@ve cache with blocks of 4 words of 4 bytes each. The cache size is 256 K bytes. How many bits in the address are used for the tag, cache index, and block/ word/byte? Each block has 16 bytes. Each set has 4 blocks. So, each set has 64 bytes The number of sets in the cache is the size of the cache divided by the size of a set => 2 8 * 2 10 /2 6 = 2 12 Thus, we need 12 bits to index a set a the cache Index= 12 bits Block has 16 bytes: need 4 bits to address a byte in a block Tag = 32 (12+4) = 16 bits

Ques@on A machine supports virtual memory and page sizes are 4 Kbytes long. A physical address is 32 bits long. How many page frames are there in main memory? Suppose that a virtual address is 34 bits long. What is size of the virtual address space in pages? How many entries are there in the page table? Es@mate the size of the page table in MB.

Answer A machine supports virtual memory and page sizes are 4 Kbytes long. A physical address is 32 bits long. How many page frames are there in main memory? 2 32 / 2 12 = 2 20 = 1024 * 1024 page frames Suppose that a virtual address is 34 bits long. What is size of the virtual address space in pages? 2 34 / 2 12 = 2 22 virtual pages How many entries are there in the page table? 2 22 Es@mate the size of the page table in MB. (2 22 * (20 + 1 + 1 + 1)) bits = 2 22 * (23/8) / 2 20 MB = 11.5 MB

Ques@on Consider the following table. Memory element Access.me Hit rate Miss Penalty L1 Cache 1 cycle 85% 5 cycles L2 cache 5 cycles 99% 100 cycles Memory 100 cycles 100% Compute the Average Memory Access Time in cycles

Answer Consider the following table. Memory element Access.me Hit rate Miss Penalty L1 Cache 1 cycle 85% 5 cycles L2 cache 5 cycles 99% 100 cycles Memory 100 cycles 100% Compute the Average Memory Access Time in cycles 1 + 0.15 *(5 + 0.01 * 100) = 1.9 cycles

Ques@on How is write on hit handled in virtual memory (write-through or write-back) and why? What is the typical page replacement policy used in virtual memory and how it is implemented?

Answer How is write on hit handled in virtual memory (write-through or write-back) and why? Write back because it takes too long to write to disk (approximately one million cycles) What is the typical page replacement policy used in virtual memory and how it is implemented? LRU, approximated by a reference bit that is reset regularly for all pages and set at every reference to a page.

Ques@on Consider a VM system that uses an approximate LRU replacement scheme for main memory whose Page Table is 0 1 2 3 4 Valid bit Reference bit Dirty bit Page frame number Disk address 1 0 0 2 ---- 0 1 1 ---- PF:3654 1 1 1 1 --- 1 1 0 0 --- 1 1 1 3 --- Does a reference to virtual page 1 cause a page fault? If main memory only has space for 4 pages, which page would you replace to bring virtual page 1 into memory?

Answer Consider a VM system that uses an approximate LRU replacement scheme for main memory whose content is 0 1 2 3 4 Valid bit Reference bit Dirty bit Page frame number Disk address 1 0 0 2 ---- 0 1 1 ---- PF:3654 1 1 1 1 --- 1 1 0 0 --- 1 1 1 3 --- Does a reference to virtual page 1 cause a page fault? Yes If main memory only has space for 4 pages, which page would you replace to bring virtual page 1 into memory? Virtual page 3 because the reference bit is 1 but has not been modified (dirty bit is zero) or virtual page 0 because its reference bit is zero.

Ques@on A program has to execute A arithme@c opera@ons that take t seconds each. Ten percent of them cannot be executed in parallel. Give an expression for the program s execu@on @me when executed on p processors. What is the minimum possible execu@on @me of this program?

Answer A program has to execute A arithme@c opera@ons that take t seconds each. Ten percent of them cannot be executed in parallel. Give an expression for the program s execu@on @me when executed on p processors. What is the minimum possible execu@on @me of this program? Execu@on @me = 0.1 A * t + 0.9 * A * t / p Minimum execu@on = 0.1 A * t (limit when p -> )

Ques@on What are vector machines? What kind of instruc@ons they have in addi@on to conven@onal instruc@ons? Give examples. What kind of applica@ons are they useful for?

Answer What are vector machines? What kind of instruc@ons they have in addi@on to conven@onal instruc@ons? Give examples. What kind of applica@ons are they useful for? Can operate on vectors and not only on scalars. Load/store vectors; add vectors; mul@ply a scalar by a vector Useful for scien@fic and mul@media applica@ons