CIS 371 Spring 2015 Computer Organization and Design 7 May 2015 Final Exam

Similar documents
CIS 371 Spring 2015 Computer Organization and Design 7 May 2015 Final Exam Answer Key

CIS 110 Introduction to Computer Programming 8 October 2013 Midterm

CIS 110 Introduction to Computer Programming Summer 2016 Midterm. Recitation # (e.g., 201):

CIS 110 Introduction to Computer Programming. 17 December 2012 Final Exam

CIS 110 Fall 2014 Introduction to Computer Programming 8 Oct 2014 Midterm Exam Name:

CIS 110 Fall 2016 Introduction to Computer Programming 13 Oct 2016 Midterm Exam

CIS 110 Introduction to Computer Programming. 13 February 2013 Make-Up Midterm Midterm

CIS 110 Fall 2015 Introduction to Computer Programming 7 Oct 2015 Makeup Midterm Exam

CIS 110 Introduction to Computer Programming. 12 February 2013 Midterm

CIS 110 Introduction to Computer Programming Spring 2016 Midterm

CIS 110 Introduction to Computer Programming Summer 2014 Midterm. Name:

CIS Introduction to Computer Programming. 5 October 2012 Midterm

CIS 110 Introduction to Computer Programming Summer 2018 Final. Recitation # (e.g., 201):

CIS 110 Introduction to Computer Programming Summer 2018 Midterm. Recitation ROOM :

CIS Introduction to Computer Programming Spring Exam 1

CIS 110 Introduction to Computer Programming. February 29, 2012 Midterm

CIS 110 Introduction to Computer Programming Summer 2018 Final. Recitation # (e.g., 201):

CIS 110 Introduction To Computer Programming. February 29, 2012 Midterm

CIS 110 Introduction to Computer Programming. 7 May 2012 Final Exam

CIS 110 Introduction to Computer Programming Summer 2014 Final. Name:

CIS 110 Introduction to Computer Programming Summer 2017 Final. Recitation # (e.g., 201):

CIS 371 Spring 2017 Computer Organization and Design 25 April 2017 Final Exam Answer Key

CIS 110 Introduction to Computer Programming Fall 2017 Midterm. Recitation ROOM :

CIS Introduction to Computer Programming Spring Exam 2

Midterm Exam 1 Wednesday, March 12, 2008

CIS 371 Spring 2016 Computer Organization and Design 17 March 2016 Midterm Exam Answer Key

CIS 110 Introduction to Computer Programming Summer 2016 Final. Recitation # (e.g., 201):

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

Write only as much as necessary. Be brief!

Write only as much as necessary. Be brief!

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu

CIS 110 Introduction to Computer Programming Spring 2016 Final Exam

CS 152 Computer Architecture and Engineering

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

Computer Architecture Spring 2016

ISA Instruction Operation

NAME: Problem Points Score. 7 (bonus) 15. Total

Design of Digital Circuits ( L) ETH Zürich, Spring 2017

CIS 110 Introduction to Computer Programming Fall 2017 Final Midterm. Recitation Section# :

CIS 110 Introduction To Computer Programming. November 21st, 2011 Exam 2

CS 152, Spring 2011 Section 8

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

Complex Pipelines and Branch Prediction

ECE 411 Exam 1. This exam has 5 problems. Make sure you have a complete exam before you begin.

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

Computer System Architecture Midterm Examination Spring 2002

CS152 Computer Architecture and Engineering. Complex Pipelines

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

HW1 Solutions. Type Old Mix New Mix Cost CPI

/ : Computer Architecture and Design Spring Final Exam May 1, Name: ID #:

Tutorial 11. Final Exam Review

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović

EECS 470 Midterm Exam Winter 2009

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

EECS 470 Midterm Exam

Final Exam Fall 2007

CSCE 212: FINAL EXAM Spring 2009

Superscalar Organization

Chapter 06: Instruction Pipelining and Parallel Processing

Copyright 2012, Elsevier Inc. All rights reserved.

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

CIS 110 Introduction to Computer Programming Summer 2016 Final. Recitation # (e.g., 201):

EECS 470 Final Exam Fall 2013

Good luck and have fun!

CS252 Graduate Computer Architecture

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

Department of Electrical and Computer Engineering The University of Texas at Austin

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

Photo David Wright STEVEN R. BAGLEY PIPELINES AND ILP

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS/ECE 552: Introduction to Computer Architecture

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Cache Organizations for Multi-cores

Computer Architecture Spring 2016

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

University of California, Berkeley College of Engineering

CMSC411 Fall 2013 Midterm 2 Solutions

ECEN/CSCI 4593 Computer Organization and Design Exam-2

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

EECS 470 Midterm Exam Answer Key Fall 2004

ECE 3055: Final Exam

EECS 470 Midterm Exam Winter 2015

Computer System Architecture Final Examination Spring 2002

University of California, Berkeley College of Engineering

s complement 1-bit Booth s 2-bit Booth s

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

Spring 2014 Midterm Exam Review

EECS 470 Final Exam Fall 2005

Department of Computer Science Duke University Ph.D. Qualifying Exam. Computer Architecture 180 minutes

Lecture: Out-of-order Processors

Very short answer questions. "True" and "False" are considered short answers.

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

CMSC411 Fall 2013 Midterm 1

ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi

CSE 240A Midterm Exam

ECE 411 Exam 1 Practice Problems

Handout 2 ILP: Part B

Static & Dynamic Instruction Scheduling

Transcription:

CIS 371 Spring 2015 Computer Organization and Design 7 May 2015 Final Exam Name: Recitation # (e.g., 201): Pennkey (e.g., eeaton): My signature below certifies that I have complied with the University of Pennsylvania s Code of Academic Integrity in completing this examination. Signature Date Instructions: Do not open this exam until told by the proctor. You will have 120 minutes to finish it. Make sure your phone is turned OFF (not to vibrate!) before the exam starts. Food, gum, and drink are strictly forbidden. You may not use your phone or open your bag for any reason, including to retrieve or put away pens or pencils, until you have left the exam room. This exam is closed-book, closed-notes, and closed-computational devices. If you get stuck on a problem, it may be to your benefit to move on to another question and come back later. Do not separate the pages. If a page becomes loose, reattach it with the provided staplers. Staple all scratch paper to your exam. Do not take any sheets of paper with you. If you require extra paper, please use the backs of the exam pages or the extra pages provided at the end of the exam. Clearly indicate on the question page where the graders can find the remainder of your work (e.g., back of page or on extra sheet ). Use a pencil, or blue or black pen to complete the exam. If you have any questions, raise your hand and a proctor will come to answer them. When you turn in your exam, you may be required to show ID. If you forgot to bring your ID, talk to an exam proctor immediately. Good luck! Scores: [For instructor use only] Question 0 1 pts Question 1 7 pts Question 2 12 pts Question 3 25 pts Question 4 13 pts Question 5 24 pts Question 6 26 pts Total: 108 pts

CIS 371 Spring 2015 Final Score: Page: 1 0.) The Easy One (1 point total) Check to make certain that your exam has all 6 pages (excluding the cover sheet). Write your name and PennKey (username) on the front of the exam. Leave the recitation field blank. Sign the certification that you comply with the Penn Academic Integrity Code. 1.) Virtual Memory (7 points total) 1.1) (2 points) Which of the following conditions (there may be more than one) are considered a page fault? (a) A requested page was loaded incorrectly from main memory; (b) A requested page was not listed in the processor s virtual memory hardware; (c) A page did something wrong and is being blamed for it; (d) The wrong page was loaded from main memory; (e) None of the above. 1.2) (1 point) How many bytes of virtual memory are available on a 16-bit, byte-addressed machine? 1.3) (2 points) How many virtual pages can be accessed on a 16-bit, byte-adressed machine if each page is 2 KB? 1.4) (2 points) What well-established phrase does each of the following three-letter acronyms stand for? Circle any that are related to virtual memory hardware. (a) TLA (b) TLB (c) TLC (d) TLD

CIS 371 Spring 2015 Final Score: Page: 2 2.) Branch Davidians (12 points total) As an ardent follower of your friendly TA David Mally, you naturally hark to his teachings on how to predict the future. Among other things, you are of course well-versed in the art of branch prediction. 2.1) (4 points) What does each of the following acronyms stand for? Explain, in at most ten words the purpose of each one. (a) RAS: (b) BTB: 2.2) (4 points) Which of the following reasons explains why the branch predictor uses only the low-order bits of the branch instruction s address? Circle all correct answers. (a) Indexing based on the low-order bits is much faster than a better hash function; (b) The predictor s size is linear in the number of address bits used; (c) The predictor s size is quadratic in the number of address bits use; (d) The predictor s size is exponential in the number of address bits used; (e) Most programs have all their branch instructions clustered together anyway; (f) Branch instructions that are far apart in memory rarely share the same low-order bits; (g) Branch instructions that are far apart in memory rarely execute in quick succession; (h) None of the above. 2.3) (4 points) GPU designers disregard David s branch teachings, and omit prediction hardware. Extend an olive branch to them by explaining, in at most twenty words why they often don t use branch prediction.

CIS 371 Spring 2015 Final Score: Page: 3 3.) Keystone XL (25 points total) In order to handle the copious amounts of data coming out of Canada, you propose an extra long version of the superscalar design from Lab 4. In the new design, each stage is split in two and clock frequency is doubled. The stages are conveniently named F1, F2, D1, D2, X1, X2, M1, M2, W1, and W2. In general, values have to be available at the beginning of X1 and M1 depending on the instruction, and results are produced at the end of X2 and M2. However ALU operations except multiply, divide, and modulo produce their results at the end of X1. 3.1) (15 points) In the list below of all possible forwarding paths, cross out any that are not necessary for a fully bypassed pipeline. Assume that each path forwards rs, rt, and NZP bits as appropriate. A.X2 -> A.X1 A.X2 -> B.X1 B.X2 -> A.X1 B.X2 -> B.X1 A.M1 -> A.X2 A.M1 -> B.X2 A.M1 -> A.X1 A.M1 -> B.X1 B.M1 -> A.X2 B.M1 -> B.X2 B.M1 -> A.X1 B.M1 -> B.X1 A.M2 -> A.M1 A.M2 -> B.M1 A.M2 -> A.X2 A.M2 -> B.X2 A.M2 -> A.X1 A.M2 -> B.X1 B.M2 -> A.M1 B.M2 -> B.M1 B.M2 -> A.X2 B.M2 -> B.X2 B.M2 -> A.X1 B.M2 -> B.X1 A.W1 -> A.M2 A.W1 -> B.M2 A.W1 -> A.M1 A.W1 -> B.M1 A.W1 -> A.X2 A.W1 -> B.X2 A.W1 -> A.X1 A.W1 -> B.X1 B.W1 -> A.M2 B.W1 -> B.M2 B.W1 -> A.M1 B.W1 -> B.M1 B.W1 -> A.X2 B.W1 -> B.X2 B.W1 -> A.X1 B.W1 -> B.X1 A.W2 -> A.W1 A.W2 -> B.W1 A.W2 -> A.M2 A.M2 -> B.M2 A.W2 -> A.M1 A.W2 -> B.M1 A.W2 -> A.X2 A.W2 -> B.X2 A.W2 -> A.X1 A.W2 -> B.X1 B.W2 -> A.W1 A.W2 -> B.W1 B.W2 -> A.M2 A.M2 -> B.M2 B.W2 -> A.M1 B.W2 -> B.M1 B.W2 -> A.X2 B.W2 -> B.X2 B.W2 -> A.X1 B.W2 -> B.X1 3.2) (2 points) You also plan to press the issue, by supporting N pipes with the ability to dispatch an instruction to every pipe on every cycle. How many read ports and how many write ports will your register file need, as a function of N? 3.3) (8 points) You expect all the groups that opposed the Pentium 4 pipeline to oppose your 10-stage, N-issue design as well. To defend your design, prepare responses explaining how you will overcome each of the potential objections below. Each response must be at most twenty words, but you do not need to write complete sentences. (a) Latency, counted in instructions, increases by a factor of N (b) Register file complexity increases by a factor of N 2 (c) Bypassing requires N 2 forwarding paths (d) Cycle time increases by a factor of N

CIS 371 Spring 2015 Final Score: Page: 4 4.) This Question Out of Order (13 points total) Consider a dynamiccaly scheduled pipeline with the stages Fetch, Dispatch, Issue, RegRead, execute, Memory, Writeback, and Commit, where the F and D stages form the in-order front end, and the I, R, X, M, and W stages form the out-of-order back-end. Assume your micro-architecture has 10 physical registers, labeled P0 through P9. Rewrite the code below, renaming all architectural registers to physical registers. Assume all physical registers are free at the start of this code, and are on the free queue in increasing numerical order. Indicate all RAW, WAW, and WAR hazards on architectural registers (R0 R8) as comments. The first few lines have been completed for you..code.code.addr 0x0000.ADDR 0x0000 I0 CONST R1, #1 CONST P0, #1 I1 CONST R2, #15 CONST P1, #15 I2 MUL R7, R1, R2 MUL P2, P0, P1; RAW R2 I3 ADD R2, R1, R0 I4 SLL R2, R2, #11 I5 I6 I7 I8 NOT R2, R1 XOR R2, R0, R0 MUL R0, R1, R2 ADD R1, R2, R3 I9 LDR R3, R1, #0 I10 SRA R3, R3, #2 I11 ADD R7, R2, #0 I12 MOD R7, R7, R3 I13 JSRR R3 I14 JSR _end _end RET

CIS 371 Spring 2015 Final Score: Page: 5 5.) This Question Also Out Of Order (24 points total) Fill in the out-of-order pipeline cycle diagram for the code in the previous question, beginning with I0 CONST R1, #1 and ending with I12 MOD R7, R7, R3. Assume the processor is three-way superscalar, triple-issue, fully bypassed, and has a single load-store unit. In other words, it has the same constraints on issuing instructions as your superscalar design from Lab 4, except that instructions can issue out-of-order. For bypassing purposes, your Lab 4 s decode stage is replaced by the RegRead stage. The first line has been completed for you. Insn 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 I0 F D I R X W M C

CIS 371 Spring 2015 Final Score: Page: 6 6.) Cache Incoherence (26 points total) Assuming a 16-bit architecture, with word-addressed memory (i.e. LC4), complete the tables below of the L1 cache contents for each of the described cache architectures, and at each point in the stream of loads and store operations in the left column. Also give the number of bits required for the tag in each cache structure. You may write all values as either hexadecimal, decimal, or binary. Each cache variant contains 8 entries, with a block size of 16 bits (one word), and employs the LRU replacement algorithm. The only difference is in their associativity. In the column headers, V stands for Valid, and D stands for Dirty. All cache frames start out invalid. Operations: Read 0x0000 from addr 0x1230 Read 0x9999 from addr 0x7899 Read 0xDDDD from addr 0xABCD Read 0x6666 from addr 0x4566 Read 0xBBBB from addr 0x789B Read 0x4444 from addr 0x4564 Read 0x2222 from addr 0x1232 Read 0xFFFF from addr 0xABCF ; Fill in 1st pair of tables 2-Way Set Associative # Tag Bits: Set Tag Data V D LRU Fully Associative #Tag Bits: Tag Data V D LRU Write 0x0123 to addr xdef0 Read 0xDDDD from addr 0xABCD Read 0x4444 from addr 0x4564 Write 0x4567 to addr x4566 ; Fill in 2nd pair of tables Read 0x0123 from addr 0xDEF0 Read 0xEEEE from addr 0xABCE Read 0x789A from addr 0xAAAA Read 0x4567 from addr x4566 ; Fill in 3rd pair of tables Set Tag Data V D LRU Tag Data V D LRU Set Tag Data V D LRU Tag Data V D LRU