Quiz for Chapter 1 Computer Abstractions and Technology

Similar documents
Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Designing for Performance. Patrick Happ Raul Feitosa

Course web site: teaching/courses/car. Piazza discussion forum:

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Performance of computer systems

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

Computer Architecture. Chapter 1 Part 2 Performance Measures

1.3 Data processing; data storage; data movement; and control.

The Role of Performance

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

Performance Metrics. 1 cycle. 1 cycle. Computer B performs more instructions per second, thus it is the fastest for this program.

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

Vector and Parallel Processors. Amdahl's Law

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Instructor Information

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

LECTURE 1. Introduction

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Performance, Power, Die Yield. CS301 Prof Szajda

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

Solutions for Chapter 4 Exercises

Computer Architecture Homework Set # 1 COVER SHEET Please turn in with your own solution

Computer Architecture

04S1 COMP3211/9211 Computer Architecture Tutorial 1 (Weeks 02 & 03) Solutions

Engineering 9859 CoE Fundamentals Computer Architecture

CpE 442 Introduction to Computer Architecture. The Role of Performance

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Response Time and Throughput

T T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.

CSE 141 Summer 2016 Homework 2

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

CST 337, Fall 2013 Homework #7

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

Updated Exercises by Diana Franklin

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance

CMSC 611: Advanced Computer Architecture

The Von Neumann Computer Model

Performance Analysis

CS430 Computer Architecture

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

/ / / Net Speedup. Percentage of Vectorization

CS341l Fall 2009 Test #2

ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Performance Measurement (as seen by the customer)

TDT4255 Computer Design. Lecture 1. Magnus Jahre

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Lecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining

1.13 Historical Perspectives and References

CPE300: Digital System Architecture and Design

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

GRE Architecture Session

CS 152 Computer Architecture and Engineering

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

Multicore Programming

CS 341l Fall 2008 Test #2

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3

Cycles Per Instruction For This Microprocessor

4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used?

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

The Computer Revolution. Classes of Computers. Chapter 1

Parts A and B both refer to the C-code and 6-instruction processor equivalent assembly shown below:

Computer Organization & Assembly Language Programming (CSE 2312)

Week 6 out-of-class notes, discussions and sample problems

ECE369: Fundamentals of Computer Architecture

Lecture: Pipelining Basics

Chapter 1. The Computer Revolution

Computer Organization. 8 th Edition. Chapter 2 p Computer Evolution and Performance

ENCM 501 Winter 2018 Assignment 2 for the Week of January 22 (with corrections)

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

CS311 Lecture: Pipelining and Superscalar Architectures

Computer Architecture s Changing Definition

Computer Architecture. What is it?

ENCM 501 Winter 2016 Assignment 1 for the Week of January 25

Hakam Zaidan Stephen Moore

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence

LECTURE 1. Introduction

5008: Computer Architecture HW#2

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

Team 1. Common Questions to all Teams. Team 2. Team 3. CO200-Computer Organization and Architecture - Assignment One

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

EITF20: Computer Architecture Part1.1.1: Introduction

Advanced Computer Architecture

Performance Evaluation CS 0447

CPSC614: Computer Architecture

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Parallel Computing Concepts. CSInParallel Project

Transcription:

Date: Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations, M1 and M2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 80 MHz and M2 has a clock rate of 100 MHz. The average number of cycles for each instruction class and their frequencies (for a typical program) are as follows: Instruction Class Machine M1 Cycles/Instruction Class Machine M2 Cycles/Instruction Class A 1 2 60% B 2 3 30% C 4 4 10% (a) Calculate the average CPI for each machine, M1, and M2. For Machine M1: Clocks per Instruction = (60/100)* 1 + (30/100)*2 + (10/100)*4 = 1.6 For Machine M2: Clocks per Instruction = (60/100)*2 + (30/100)*3 + (10/100)*4 = 2.5 (b) Calculate the average MIPS ratings for each machine, M1 and M2. For Machine M1: Average MIPS rating = Clock Rate/(CPI * 10 6 ) = (80 * 10 6 ) / (1.6*10 6 ) = 50.0 For Machine M2: Average MIPS rating = Clock Rate/(CPI * 10 6 ) = (100 * 10 6 ) / (2.5*10 6 ) = 40.0 Frequency (c) Which machine has a smaller MIPS rating? Which individual instruction class CPI do you need to change, and by how much, to have this machine have the same or better performance as the machine with the higher MIPS rating (you can only change the CPI for one of the instruction classes on the slower machine)? Machine M2 has a smaller MIPS rating. If we change the CPI of instruction class A for Machine M2 to 1, we can have a better MIPS rating than M1 as follows:

Clocks per Instruction = (60/100)*1 + (30/100)*3 + (10/100)*4 = 1.9 Average MIPS rating = Clock Rate/(CPI * 10 6 ) = (100 * 10 6 ) / (1.9*10 6 ) = 52.6 2. [10 points] (Amdahl s law question) Suppose you have a machine which executes a program consisting of 50% floating point multiply, 20% floating point divide, and the remaining 30% are from other instructions. (a) Management wants the machine to run 4 times faster. You can make the divide run at most 3 times faster and the multiply run at most 8 times faster. Can you meet management s goal by making only one improvement, and which one? Amdahl s Law states: Execution time after improvement = (Execution time affected by improvement)/(amount of Improvement) + Assuming initially that the floating point multiply, floating point divide and the other instructions had the same CPI, Execution time after Improvement with Divide = (20)/3 + (50 + 30) = 86.67 Execution time after Improvement with Multiply = (50)/8 + (20 + 30) = 56.67 Execution time after Improvement with Divide and Multiply = 20/3 + 50/8 + 30 = 42.9 4 time faster means the execution time = 100/4 = 25 The management s goal can NOT be met by making the improvement with Multiply or Divide or Both. (b) Dogbert has now taken over the company removing all the previous managers. If you make both the multiply and divide improvements, what is the speed of the improved machine relative to the original machine? If we make both the improvements, Execution time after Improvement = (50)/8 + (20)/3 + (30) = 42.9 The speedup relative to the original machine = (100)/(42.9) = 2.33 3. [5 points] Suppose that we can improve the floating point instruction performance of machine by a factor of 15 (the same floating point instructions run 15 times faster on this new machine). What percent of the instructions must be floating point to achieve a Speedup of at least 4? We will use Amdahl s Law again for this question. Let x be percentage of floating point instructions. Since the speedup is 4, if the original program executed in 100 cycles, the new program runs in 100/4 = 25 cycles. (100)/4 = (x)/15 + (100 x) Solving for x, we get: x = 80.36 The percent of floating point instructions need to be 80.36. 4. [6 points] Just like we defined MIPS rating, we can also define something called the MFLOPS rating which stands for Millions of Floating Point operations per Second. If Machine A has a higher MIPS rating than that of Machine B, then does Machine A necessarily have a higher MFLOPS rating in comparison to Machine B? Page 2 of 6

A higher MIPS rating for machine A compared to machine B need not imply a higher MFLOPS rating for that machine A. One reason for this can be the following: It is possible that the floating point instructions form a fairly low proportion of the all the instructions in a given program. So if the floating point operations of machine B are far more efficient than the floating point operations of machine A while the other (integer, memory etc) instructions are more efficient on B, then machine B gets a higher MFLOPS rating than A while A has a higher MIPS rating. 5. [6 points] Consider the SPEC benchmark. Name two factors that influence the resulting performance on any particular architecture. Two factors that influence the resulting performance of a SPEC benchmark on any particular architecture are (a) The compiler flags used to compile the benchmark (b) The input data that is given to the benchmark while measuring performance 6. [5 points] How did the development of the transistor affect computers? What did the transistor replace? The answer to the first part of this question is subjective: The development of transistor has had a tremendous impact on bring the computer to our homes in the form of PCs and more recently into our hands (Personal Digital Assistants etc). The ability to package and integrate transistors on a chip at a rate that has been increasing exponentially according to Moore s Law has resulted in tremendous performance gains for programs without having to change any of them. The transistor replaced the vacuum tube in 1951. 7. [25 points] A two-part question: (Part A) Assume that a design team is considering enhancing a machine by adding MMX (multimedia extension instruction) hardware to a processor. When a computation is run in MMX mode on the MMX hardware, it is 10 times faster than the normal mode of execution. Call the percentage of time that could be spent using the MMX mode the percentage of media enhancement. (a) What percentage of media enhancement is needed to achieve an overall speedup of 2? We will use Amdahl s Law for this question. Execution time with Media Enhancement = (Execution time improved by Media enhancement)/(amount of Improvement) + Let x be the percent of media enhancement needed for achieving an overall speedup of 2. Then, (100)/2 = (x)/10 + (100-x) Solving for x, we have x = 55.55 (b) What percentage of the run-time is spent in MMX mode if a speedup of 2 is achieved? (Hint: You will need to calculate the new overall time.) The new overall time is 100/2 = 50. = 5.55 + 44.45. 5.55 is for media enhancement using MMX mode. 5.55/(50) = 11.1% 11 percentage of the run-time is spent in MMX mode Page 3 of 6

(c) What percentage of the media enhancement is needed to achieve one-half the maximum speedup attainable from using the MMX mode? One-half of the maximum speedup attainable from using the MMX mode = 10/2 =5. 5 = 1/(( 1- x) + x/10) x = 88%. (Part B) If processor A has a higher clock rate than processor B, and processor A also has a higher MIPS rating than processor B, explain whether processor A will always execute faster than processor B. Suppose that there are two implementations of the same instruction set architecture. Machine A has a clock cycle time of 20ns and an effective CPI of 1.5 for some program, and machine B has a clock cycle time of 15ns and an effective CPI of 1.0 for the same program. Which machine is faster for this program, and by how much? The CPU Time is given by the equation: CPU Time = Instruction count * CPI * Clock cycle Time MIPS rating is defined by: MIPS = (Clock Rate)/(CPI * 10 6 ) Assuming instruction counts are the same, (CPUTime) A = (I) * 1.5 * 20ns = (I)*30ns (CPUTime) B = (I) * 1.0 * 15ns =(I)*15ns Machine B is faster by twice as much as Machine A. 8. [6 points] Suppose a program segment consists of a purely sequential part which takes 25 cycles to execute, and an iterated loop which takes 100 cycles per iteration. Assume the loop iterations are independent, and cannot be further parallelized. If the loop is to be executed 100 times, what is the maximum speedup possible using an infinite number of processors (compared to a single processor)? The sequential part takes 25 cycles. Each iteration of the loop (which takes 100 cycles) can executed independently and there are totally 100 iterations. Applying Amdahl s law, Execution time after improvement = (Execution time affected by improvement)/(amount of Improvement) + = (100*100)/100 + 25 Page 4 of 6

= 100 + 25 =125 Execution time on a single processor = 100 * 100 + 25 = 10025 Speedup = 10025/125 = 80.2 9. [5 points] Computer A has an overall CPI of 1.3 and can be run at a clock rate of 600MHz. Computer B has a CPI of 2.5 and can be run at a clock rate of 750 Mhz. We have a particular program we wish to run. When compiled for computer A, this program has exactly 100,000 instructions. How many instructions would the program need to have when compiled for Computer B, in order for the two computers to have exactly the same execution time for this program? (CPUTime) A = (Instruction count) A * (CPI) A * (Clock cycle Time) A = (100,000)*(1.3)/(600*10 6 ) ns (CPUTime) B = (Instruction count) B * (CPI) B * (Clock cycle Time) B = (I) B*(2.5)/(750*10 6 ) ns Since (CPUTime) A = (CPUTime) B, we have to solve for (I) B and get 65000 10. [5 points] Imagine that you are able to perform benchmarking races to compare two computers you are thinking about buying. Come up with a list of 5 benchmark programs or usage scenarios you would use to create your own personalized benchmark suite. For each program you select, justify it. For the benchmark suite as a whole, discuss a method for calculated a weighted average of the different program run-times. The answer to this question is subjective. One possible list of benchmarks could include the following: Benchmark Program Browser with multimedia plug ins Financial Application (Stock Value Predictor) Games that involve a lot of AI Word processing software Desktop Search Software Reason to choose this benchmark Opening a lot of media content in a lot of tabs at the same can be used to test the memory performance and parallelism potential Can be used to measure floating point and vector instruction set performance Can be used to test branch prediction performance Most common use case. Can be used to test general integer operations and rendering Can be used to test I/O performance since a search software builds up a huge index by constant read of file system Program Runtime (s) 300 5 200 4 100 3 200 5 300 3 Weight The weighted average of the runtimes is = (300*5 + 200*4 + 100*3 + 200*5 + 300*3)/20 = 225 11. [8 points] The design team for a simple, single-issue processor is choosing between a pipelined or non-pipelined implementation. Here are some design parameters for the two possibilities: Parameter Pipelined Version Non-Pipelined Version Clock Rate 500MHz 350 MHz CPI for ALU instructions 1 1 CPI for Control 2 1 Page 5 of 6

instructions CPI for Memory instructions 2.7 1 (a) For a program with 20% ALU instructions, 10% control instructions and 70% memory instructions, which design will be faster? Give a quantitative CPI average for each case. Average CPI for Pipelined Version = (0.2*1 + 0.1*2 + 0.7*2.7) = 2.29 Average CPI for Non-Pipelined Version = (0.2*1 + 0.1*1 + 0.7*1) = 1.0 CPU execution time for Pipelined version = 2.26/(500 Mhz) = 4.5ns CPU execution time for Non-Pipelined version = 1.0/(350 Mhz) = 2.8ns The non-pipelined version is faster. (b) For a program with 80% ALU instructions, 10% control instructions and 10% memory instructions, which design will be faster? Give a quantitative CPI average for each case. Average CPI for Pipelined Version = (0.8*1 + 0.1*2 + 0.1*2.7) = 1.27 Average CPI for Non-Pipelined Version = (0.8*1 + 0.1*1 + 0.1*1) = 1.0 CPU execution time for Pipelined version = 1.27/(500 Mhz) = 2.54ns CPU execution time for Non-Pipelined version = 1.0/(350 Mhz) = 2.8ns The pipelined version is faster. 12. [5 points] A designer wants to improve the overall performance of a given machine with respect to a target benchmark suite and is considering an enhancement X that applies to 50% of the original dynamically-executed instructions, and speeds each of them up by a factor of 3. The designer s manager has some concerns about the complexity and the cost-effectiveness of X and suggests that the designer should consider an alternative enhancement Y. Enhancement Y, if applied only to some (as yet unknown) fraction of the original dynamically-executed instructions, would make them only 75% faster. Determine what percentage of all dynamically-executed instructions should be optimized using enhancement Y in order to achieve the same overall speedup as obtained using enhancement X. We will use Amdahl s Law for this problem. Execution time after improvement = (Execution time affected by improvement)/(amount of Improvement) + Execution Time using X = (50)/3 + (100-50) = 66.67 The speedup is given by = (100)/66.67 = 1.5 Let the percentage of dynamically executed instructions to which Y is to be applied be x. Execution Time using Y = (x)/1.75 + (100-x) SpeedUp = (100)/(Execution Time using Y) = 1.5 Solving for x, we get x = 77.78 Page 6 of 6