CS/ECE 552: Introduction to Computer Architecture

Similar documents
ECE 341 Final Exam Solution

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

Computer Architecture CS372 Exam 3

Do not open this exam until instructed to do so. CS/ECE 354 Final Exam May 19, CS Login: QUESTION MAXIMUM SCORE TOTAL 115

Computer Architecture V Fall Practice Exam Questions

Computer System Architecture Quiz #1 March 8th, 2019

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Virtual Memory Overview

Computer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name:

CS/CoE 1541 Exam 2 (Spring 2019).

Comprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006

ECEN/CSCI 4593 Computer Organization and Design Exam-2

Write only as much as necessary. Be brief!

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.

Main Memory (Fig. 7.13) Main Memory

ISA Instruction Operation

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi

CS/CoE 1541 Mid Term Exam (Fall 2018).

/ : Computer Architecture and Design Fall Final Exam December 4, Name: ID #:

HW1 Solutions. Type Old Mix New Mix Cost CPI

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems

Cache Organizations for Multi-cores

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Midterm #2 Solutions April 23, 1997

Pipelining Exercises, Continued

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

CMU Introduction to Computer Architecture, Spring 2014 HW 5: Virtual Memory, Caching and Main Memory

ECE Sample Final Examination

Background. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.

Caching Basics. Memory Hierarchies

EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM

Pipelined processors and Hazards

Write only as much as necessary. Be brief!

Q1: Finite State Machine (8 points)

Logical Diagram of a Set-associative Cache Accessing a Cache

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Introduction. Memory Hierarchy

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Hamming Codes. s 0 s 1 s 2 Error bit No error has occurred c c d3 [E1] c0. Topics in Computer Mathematics

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

s complement 1-bit Booth s 2-bit Booth s

Lecture 24: Virtual Memory, Multiprocessors

CSE502: Computer Architecture CSE 502: Computer Architecture

COMP9242 Advanced OS. Copyright Notice. The Memory Wall. Caching. These slides are distributed under the Creative Commons Attribution 3.

Tutorial 11. Final Exam Review

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Course Description: This course includes concepts of instruction set architecture,

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

JNTUWORLD. 1. Discuss in detail inter processor arbitration logics and procedures with necessary diagrams? [15]

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Page 1. Memory Hierarchies (Part 2)

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Faculty of Science FINAL EXAMINATION

EN1640: Design of Computing Systems Topic 06: Memory System

ADMIN. SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) Down the home stretch. Split Caches. Final Exam Monday May 1 (first exam day)

Modern Computer Architecture

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions

Introduction to cache memories

EECS 470 Midterm Exam Winter 2008 answers

Final Exam Fall 2007

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

CSE 378 Final Exam 3/14/11 Sample Solution

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

CPSC 313, Summer 2013 Final Exam Solution Date: June 24, 2013; Instructor: Mike Feeley

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

COMP9242 Advanced OS. S2/2017 W03: Caches: What Every OS Designer Must

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Alexandria University

EN1640: Design of Computing Systems Topic 06: Memory System

A Few Problems with Physical Addressing. Virtual Memory Process Abstraction, Part 2: Private Address Space

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu

Computer Architecture and Engineering. CS152 Quiz #3. March 18th, Professor Krste Asanovic. Name:

Final Exam Fall 2008

Chapter 2: Memory Hierarchy Design Part 2

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Computer System Architecture Final Examination

1. Creates the illusion of an address space much larger than the physical memory

CSCE 212: FINAL EXAM Spring 2009

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

Virtual Memory Review. Page faults. Paging system summary (so far)

Lecture: Pipelining Basics

Lecture notes for CS Chapter 2, part 1 10/23/18

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

University of Toronto Faculty of Applied Science and Engineering

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the

Transcription:

CS/ECE 552: Introduction to Computer Architecture Prof. David A. Wood Final Exam May 9, 2010 10:05am-12:05pm, 2241 Chamberlin Approximate Weight: 25% CLOSED BOOK TWO SHEETS OF NOTES NAME: DO NOT OPEN THE EXAM UNTIL TOLD TO DO SO! Read over the entire exam before beginning. Verify that your exam includes all 9 pages. It is a long exam, so use your time carefully. Budget your time according to the weight of the questions, and your ability to answer them. Limit your answers to the space provided, if possible. If not, write on the BACK OF THE SAME SHEET. Use the back of the sheet for scratch work. WRITE YOUR NAME ON EACH SHEET. Problem Possible Points Points Problem 1 20 Problem 2 15 Problem 3 20 Problem 4 25 Problem 5 20 Total 100

Problem 1: (20 points) Ideally, the 5-stage pipeline discussed in class (and the book) will complete one instruction every cycle. Stall Cycles Per Instruction (SCPI) is a metric that measures the average number of stalls (i.e., pipeline bubbles) that get introduced per instruction. Thus the processor s CPI = 1 + SCPI. SCPI can be expressed as the sum of the SCPI s of different (independent) factors. For example, CPI = 1 + SCPI data + SCPI branch + SCPI I-cache + SCPI D-cache breaks the CPI down into stalls due to data dependence stalls, branch stalls, instruction cache stalls, and data cache stalls. Consider a processor and memory system with the following properties: Branches are predicted not-taken and resolved in Execute. Cache hits stall the pipeline one cycle; cache misses stall the pipeline 15 cycles. The only data hazards that stall the pipeline are caused by load-use dependences (1 cycle). The workload has the following properties: Loads are 30% of instructions; stores are 15% of instructions Branches are 12.5% of instructions; 60% of branches are taken 33% of load instructions are immediately followed by a dependent instruction 3% of instruction fetches miss; 5% of load and store instructions miss Part A: (12 points) Complete the table below. Show both the equation and the final value. Equation Value SCPI data 30% * 33% * 1 cycle.099 SCPI branch 12.5% * 60% * 2 cycles.15 SCPI I-cache 100% * (1-3%) hits * 1 + 3% * 15 1.42 SCPI D-cache (30% + 15%) * ((1-5%) * 1 + 5% * 15).765 CPI 1 + SCPI data + SCPI branch + SCPI I-cache + SCPI D-cache 3.43 Page 2 of 9

Part B: (8 points) Stalling on cache hits has a siginificant impact on performance. How much does the CPI change if we can eliminate stalls on cache hits? Complete the table and calculate the speedup: Equation Value SCPI I-cache 100% * 3% * 15 0.45 SCPI D-cache (30% + 15%) * 5% * 15.3375 CPI 1 + SCPI data + SCPI branch + SCPI I-cache + SCPI D-cache 2.04 CPI old /CPI new = 3.43/2.04 = 1.68 Speedup (assuming cycle time remains constant): 1.68 Page 3 of 9

Problem 2: (15 points) Part A: (3 points) Show the 1-bit Booth recoding for the 8-bit multiplier -99 ten. -99 ten = 10011101 two = -1 0 1 0 0-1 1-1 Part B: (3 points) What is the 2-bit modified Booth recoding of the multiplier in Part A? -2 2-1 1 Part C: (4 points) A 16-bit multiplier has been recoded using the 2-bit modified Booth recoding algorithm. The recoded multiplier is: -1 1 2 0-1 -2 1-1 What is the original 16-bit two s complement multiplier? 11 01 01 11 10 10 00 11 0-1 1-1 10 00-11 -10 01 0-1 -1 1 2 0-1 -2 1-1 Part D: (5 points) How does recoding the multiplier using 2-bit modified Booth algorithm help improve the speed of the multiplication? Recoding the multiplier with 1-bit Booth simplifies the treatment of negative numbers, but doesn t really help improve the speed of the multiplication. However, using the 2-bit algorithm reduces the number of partial products by half, which reduces the number of adders we need in our multiplier (e.g., the number of inputs to the Wallace tree). Page 4 of 9

Problem 3: (20 points) Consider a memory system with the following parameters. The virtual address has 56 bits. The physical address has 48 bits. A unified (i.e., instructions and data) cache is writeback and has 64 kilobyte capacity with 32-byte blocks, is 8-way set associative with pseudo-least-recently-used (plru) replacement (also called hierarchical nmru). The translation lookaside buffer (TLB) has 96 entries, is fully associative, and is accessed before the cache. Pages are 4 kilobytes. All addresses are byte addresses. Part A: (2 points) Show the breakup of a virtual address into virtual page number and byte offset within a page. Indicate the number of bits in each field. Virtual Page Number<55:12>. Page Offset <11:0> Part B: (2 points) Show the breakup of a physical address into a page frame number and a byte offset with the page frame. Indicate the number of bits in each field. Page Frame Number<47:12>. PageOffset<11:0> Part C: (3 points) How many bits of storage does it take to implement this TLB? Assume the minimum number of bits possible to achieve a correct implementation of address translation. TLB entry = VPN + PFN + valid bit + page dirty = 44 + 36 + 1 = 81 bits/entry 96 entries * 82 bits/entry = 7,776 Part D: (3 points) What other bits of state may a TLB maintain? What are these used for? Reference = Used by OS to approximate LRU or other page replacement policy Dirty = Used by OS to reduce writeback traffic to disk Protection = Use by OS to limit access to pages Part E: (2 points) Show the break up of the physical address into tag, index, and byte within block used to access the cache. Indicate the number of bits in each field. Tag<47:13>. Index <12:5>. BlockOffset<4:0> Part F: (2 points) How many sets are there in the cache? 256 sets Part G: (2 points) How many bits per set are needed to implement the plru replacement policy? n-1 = 7 Part H: (4 points) How many total bits does it take to implement the cache (i.e., tags, state, data, LRU, etc.). 35 bit tag + 1 valid + 1 dirty + 8 * 32 data = 293/block 8 * 293 + 7 = 2351 bits per set 256 sets * 2351 bits per set = 601,856 bits Page 5 of 9

Problem 4: (25 points) Part A: (2 points) Odd parity is frequently used to ensure that all code words stored in memory have at least one 1. Consider a code word pb 3 b 2 b 1 b 0 consisting of a parity bit p and four data bits b 3 b 2 b 1 b 0. What is the equation to generate the parity bit p? p ^ b 3 ^ b 2 ^ b 1 ^ b 0 = 1 p = 1 ^ b 3 ^ b 2 ^ b 1 ^ b 0 Part B: (2 points) What is the Hamming distance between 0011 0001 two and 1101 0111 two? 5 Part C: (2 points) What is the minimum Hamming distance between any pair of valid code words encoded using odd parity? 2 Part D: (2 points) What is the minimum Hamming distance needed between any pair of valid code words to correct a single bit error? 3 Part E: (2 points) What is the minimum Hamming distance needed between any pair of valid code words to detect a single bit error? 2 Part F: (2 points) What is the minimum Hamming distance needed between any pair of valid code words to correct a single bit error AND detect a double bit error? 4 Page 6 of 9

Part G: (5 points) The parity check matrix for an error correcting code is given below. In this matrix C i s denote check bits and b i s denote information bits. The codewords are stored in memory in the bit order C 3 C 2 C 1 C 0 b 3 b 2 b 1 b 0. b 3 b 2 b 1 b 0 C 0 1 1 0 1 C 1 1 0 1 1 C 2 0 1 1 0 C 3 1 1 1 1 Consider the data word b 3 b 2 b 1 b 0 = 1010. Assuming odd parity is used to compute all check bits, what is the codeword that is stored in memory for the data word given the above parity check matrix? 1010 1010 Part H: (4 points) Suppose that the word read from the memory is 1100 0111, calculate the syndrome using the parity check matrix above. Syndrome = c 3 ^ c 3, c 2 ^ c 2, c 1 ^ c 1, c 0 ^ c 0 Syndrome = 1 ^ 0, 1 ^ 1, 0 ^ 1, 0 ^ 1 = 1011 Part I: (4 points) What is the procedure for detecting a double error? If the syndrome is non-zero AND has an even number of bits set (i.e., even parity). With two errors, the syndrome will either be the XOR of two names, two check bits, or a name and a check bit. Since all names have an odd number of bits, the syndome must have even parity. Part J: (3 points, extra credit) Was there a double bit error in Part H? If not, what was the value originally written into memory? Like most real systems, ignore the possibility of more than two errors. No, there was only a single bit error. The syndrome has an odd number of ones. There is an error in the matrix, since there are two bits (b 3 and b 0 ) that have the same name. We have no way of knowing which of these was the bit that flipped. I will give credit for either answer and two extra points for those that identify the problem. Page 7 of 9

Problem 5: (20 points) Consider a bus-based multiprocessor system with writeback caches. Four processors, P1, P2, P3, and P4 perform the following sequence of loads and stores to/from lines A and B. Assume A and B do not conflict in the data caches. Assume the protocol on the next page, which uses the three states: Invalid (I), Shared (S), and Exclusive (E). The table below shows the state of the memory system as time flows down. The cache blocks are represented with the following notation: address:(state, data). For example, A:(I,0) means that cache block A is in state I with data value 0. A data value of x means the value is unknown or undefined. Complete the table below, updating the cache and memory states in response to the sequence of loads and stores. Indicate actions taken by the cache and memory controllers: hits, requests to get a block shared or exclusive, and responses to requests. You may use arrows (as shown) to indicate that the state has not changed in that cycle. P1 P2 P3 P4 MEM A:(I,x) B:(I,x) A:(I,x) B:(I,x) A:(I,x) B:(I,x) A:(I,x) B:(I,x) A:0 B:0 load A miss, get A shared A:(S,0) B:(I,x) A:0 B:0 load B miss, get B shared A:(I,x) B:(S,0) load A miss, get A in S A:(S,0) B:(S,0) respond with B A:0 B:0 A: 0 B:0 store B = 1 hit, send Inv A:(s,0) B:(E,1) A: 0 B:0 load A miss, get A in S A:(S,0) B:(I,x) A: 0 B:0 load B miss Get B in S A:(S,0) B:(S,1) respond with B A:(S,0) B:(S,1) update B A:0 B:1 invalidate A A:(I,0) B:(S,1) invalidate A A:(I,0) B:(S,1) invalidate A A:(I,0) B:(I,x) store A = 3 miss, Get A in E A:(A,3) B:(I,x) A: 0 B:1 load A miss, get A in S A:(S,3) B:(S,1) A:(S,3) B:(I,x) update A A:3 B:1 load B miss, get B in S A:(S,3) B:(S,1) Respond with B A:3 B: 1