CSCE 212: FINAL EXAM Spring 2009

Similar documents
4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Computer Architecture CS372 Exam 3

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

CMSC411 Fall 2013 Midterm 1

ECE 3055: Final Exam

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

CS 251, Winter 2018, Assignment % of course mark

CS232 Final Exam May 5, 2001

Fall 2009 CSE Qualifying Exam Core Subjects. September 19, 2009

Final Exam Fall 2008

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

CS 251, Winter 2019, Assignment % of course mark

Cache Organizations for Multi-cores

CSE 378 Final Exam 3/14/11 Sample Solution

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.

Final Exam Fall 2007

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

CS232 Final Exam May 5, 2001

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

Computer Architecture V Fall Practice Exam Questions

ECE 30 Introduction to Computer Engineering

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

CS/CoE 1541 Mid Term Exam (Fall 2018).

5008: Computer Architecture HW#2

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Final Exam Spring 2017

ISA Instruction Operation

Performance Evaluation CS 0447

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Computer Organization and Architecture (CSCI-365) Sample Final Exam

Write only as much as necessary. Be brief!

CS 341l Fall 2008 Test #2

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double

CSE502 Lecture 15 - Tue 3Nov09 Review: MidTerm Thu 5Nov09 - Outline of Major Topics

ECE Sample Final Examination

CS/COE 0447 Example Problems for Exam 2 Fall 2010

Chapter 4. The Processor

Comprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006

CS/COE1541: Introduction to Computer Architecture

University of California, Berkeley College of Engineering

Introduction. Datapath Basics

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

ECE550 PRACTICE Midterm

University of California, Berkeley College of Engineering

University of California, Berkeley College of Engineering

ECE 341 Final Exam Solution

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

ECE260: Fundamentals of Computer Engineering

ECE331: Hardware Organization and Design

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Problem Points 1 /20 2 /20 3 /20 4 /20 5 /20 Total /100

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer

ECE 15B Computer Organization Spring 2011

ECE331: Hardware Organization and Design

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Computer Organization and Structure

Q1: Finite State Machine (8 points)

1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11

HW1 Solutions. Type Old Mix New Mix Cost CPI

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Winter 2002 FINAL EXAMINATION

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

Lecture 7: Examples, MARS, Arithmetic

Computer Architecture Homework Set # 1 COVER SHEET Please turn in with your own solution

Do not start the test until instructed to do so!

Prerequisite Quiz January 23, 2007 CS252 Computer Architecture and Engineering

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

MCS-284 Final Exam Serial #:

Pipelined processors and Hazards

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

What is Pipelining? RISC remainder (our assumptions)

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013

EECS 322 Computer Architecture Superpipline and the Cache

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

1 5. Addressing Modes COMP2611 Fall 2015 Instruction: Language of the Computer

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

COMPUTER ORGANIZATION AND DESIGN

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam One 19 September 2012

MIPS Assembly Programming

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Transcription:

CSCE 212: FINAL EXAM Spring 2009 Name (please print): Total points: /120 Instructions This is a CLOSED BOOK and CLOSED NOTES exam. However, you may use calculators, scratch paper, and the green MIPS reference card from your textbook. Ask the instructor if you have any questions. Good luck, and have a good summer! 1. (10 points) Intel is considering two different enhancements to their newest i8 line of processors. They only have enough time to implement one of these enhancements before the scheduled release date of the processor. The instruction set of their original processor can be divided into three different types of instructions having the following CPIs: Instruction type CPI A 1.4 B 2.4 C 2 The benchmarks that are used to evaluate processor performance consist of the following instruction mixture: Instruction type Percentage A 30% B 60% C 10% Intel must choose one of the following options: 1. decrease the CPI of instruction type A by a factor of 15 (divide by 15), or 2. decrease the CPI of instruction type B by a factor of 2 (divide by 2). These enhancements will not affect the number of instructions or the clock rate. Which enhancement is preferable? You must justify your answer numerically (with calculations). option 1 cpi = 1.4/15 *.30 + 2.4 *.60 + 2 *.10 = 1.668 option 2 cpi = 1.4 *.30 + 2.4/2 *.60 + 2 *.10 = 1.340 (better)

2. Consider the following program, which converts a ASCII-encoded string representation of a decimal number into binary..data str:.asciiz 891.text main: li $t0,0 # initialize index register to 0 li $s0,0 # initialize running sum to 0 loop: lb $s1, str($t0) # load a character from the string beqz $s1, exit # if we read the null character, then see ya! sll $t2, $s0, 3 sll $s0, $s0, 1 add $s0, $s0, $t2 # this instruction, along with the next two # instructions are used for: # multiplying accumulator ($s0) by ten addi $s1, $s1, -48 # convert from ASCII to binary add $s0, $s0, $s1 # accumulate! addi $t0, $t0, 1 # increment index j loop # here we go again! exit: jr $31 # end proggie a. (10 points) In the program, fill in the blank comment line to describe what computation the program is performing through the use of the corresponding instruction sequence (two sll instructions followed by an add instruction). b. (10 points) Suppose we need to run this program on an architecture that has a branch delay slot. Would we need to make any changes to the program to ensure that it behaves properly? Why or why not? If so, what change(s) must be made? no the shift after the branch won t hurt anything if the branch is taken c. (10 points) In the program code, show where all the data dependences exist, and indicate if any pipeline stalls would need to be inserted to deal with any of these dependencies (assuming the 5-stage MIPS pipeline). d. (10 points) Write a short sequence of instructions to be inserted before the first instruction (li $t0,0), as well as a short sequence of instructions that could be inserted immediately before the return instruction (jr $31) which would save and later restore the caller s $s registers using the program s stack. You only need to save the $s registers that are changed by the program. Header code for saving the caller s registers to the stack: addi $sp,$sp,-8 sw $s0,0($sp) sw $s1,4($sp) Footer code to restoring the caller s registers from the stack: lw $s0,0($sp) lw $s1,4($sp) addi $sp,$sp,8

3. This problem compares the performance of the single-cycle MIPS architecture, the multi-cycle MIPS architecture, and the pipelined MIPS architecture. Assume it requires 10 ns to perform any of the following operations: main memory access, register file access, and arithmetic operation. Assume the delay for multiplexors, registers (i.e. setup/hold time), and lookup tables is negligible. a. (10 points) What is the minimum clock period for each of the three implementations? single=>50 ns, multi=>10 ns, pipelined=>10 ns b. (10 points) What is the CPI for each of the implementations, assuming a program execution with the following instruction mix: Execution Instruction Frequency r-type arithmetic, logical, comparison 60% branch 15% load 15% store 10% Assume the pipelined CPU performs forwarding, that the compiler schedules the code to avoid load hazards, branches have a fixed 3-cycle latency (i.e. requires 2 trailing no-ops), and you may disregard the time required to fill the pipeline. single=>1, multi=.6*4+.15*3+.15*5+.10*4=4, pipelined=.85*1+.15*3=1.3 c. (10 points) What is the speedup of the pipelined processor relative to the single-cycle processor? What is the speedup of the pipelined processor relative to the multi-cycle processor? pipe vs. single = (50 * 1) / (10 * 1.3) = 3.85 pipe vs. multi = (10 * 4) / (10 * 1.3) = 3.08

4. (10 points) Suppose we want to improve the MISS RATE of a cache. List three different design enhancements that we could reasonably consider to accomplish this. Also, describe any possible side-effects, in terms of possible reductions in other aspects of cache performance, that could result from implementing each of these enhancements (and why). more associativity => higher hit time (comparing more tags), higher penalty (replacement decisions) more lines => higher hit time (bigger mux) wider lines (bigger blocks) => higher penalty (more data to bring in) 5. (10 points) What is the minimum number of bits that are required to represent the decimal value 11.0625 in binary without introducing any rounding error? Provide the minimum number of bits only, you do not need to consider any particular representation format (i.e. you do not need to use the IEEE floatingpoint format, just the raw value in base-2)..0625 =>.1250 =>.25 =>.5 => 1.0001 in base 2 11 1011 in base base 11.0625 1011.0001 so 8 bits! 6. (10 points) Assume there s a 2-way set associative cache with LRU replacement where memory addresses are interpreted in the following way: tag index word offset byte offset Fill in the following table to simulate the behavior of this cache, based on the sequence of memory references shown in column 1 of the table. BYTE ADDRESS INDEX TAG HIT OR MISS? 541 1 8 line 1, got 8 662 1 10 line 1, got 8, 10 533 1 8 hit 607 1 9 line 1, got 9,10 542 1 8 line 1, got 9, 8 656 1 10 line 1, got 10, 8

7. (10 points) It s 1997 and you re a graduate student at Stanford named Larry Page. You re trying to build a new Internet search engine and your strategy is to optimize its performance by ensuring that during a search, neither the CPU nor its disk array are idle during a search. The search database is logically divided into 100 MB contiguous (sequential) blocks. After the first block is read, the engine reads subsequent blocks while using the CPU to search the previously read block. It takes 100 ms for the CPU to search each block. You decide to use disks that rotate at 170 revolutions/sec (about 10,000 RPM), have an average seek time of 8 ms, have a transfer rate of 50 MB/sec, and have a controller overhead of 2 ms. How many disks do you need in your disk array? You do not need to include check (redundant) disks. 100 ms = ((1/170 rev/s) * (1/2) rev) + 8 ms + ((100 MB) / (50 MB/s)) * n 100 ms = 3 ms + 8 ms + 2*n 89 ms = 2*n 45 = n