04S1 COMP3211/9211 Computer Architecture Tutorial 1 (Weeks 02 & 03) Solutions

Similar documents
Team 1. Common Questions to all Teams. Team 2. Team 3. CO200-Computer Organization and Architecture - Assignment One

Designing for Performance. Patrick Happ Raul Feitosa

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

University of California at Berkeley. D. Patterson & R. Yung. Midterm I

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Solutions for Chapter 4 Exercises

Quiz for Chapter 1 Computer Abstractions and Technology

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance

For More Practice FMP

5008: Computer Architecture HW#2

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Computer Architecture

ELE 455/555 Computer System Engineering. Section 1 Review and Foundations Class 5 Computer System Performance

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek.

Performance, Power, Die Yield. CS301 Prof Szajda

CMSC 611: Advanced Computer Architecture

ECE331: Hardware Organization and Design

CS3350B Computer Architecture CPU Performance and Profiling

Parts A and B both refer to the C-code and 6-instruction processor equivalent assembly shown below:

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CS455/CpE 442 Intro. To Computer Architecure. Review for Term Exam

University of California at Berkeley. D. Patterson & R. Yung

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

Vector and Parallel Processors. Amdahl's Law

Course web site: teaching/courses/car. Piazza discussion forum:

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Course overview Computer system structure and operation

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Department of Electrical Engineering Introduction to Computer Engineering 1 Assignment 6: Computer Architecture

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

Performance Evaluation CS 0447

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Fundamentals of Quantitative Design and Analysis

ALU(B) delay in cycles Arithmetic 32% 1 2 Data Transfer 36% 2 2 Floating Point 10% 3 4 Control Transfer 22% 2 2

Advanced Computer Architecture (CS620)

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Computer Architecture Homework Set # 1 COVER SHEET Please turn in with your own solution

CS430 Computer Architecture

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

2. (3 pts) There are two different ways to measure performance. What are they? 3. (3 pts) Why doesn t MIPS have a subtract immediate instruction?

CpE 442 Introduction to Computer Architecture. The Role of Performance

Computer Architecture V Fall Practice Exam Questions

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Part II: Solutions Guide

T T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 8 Instruction-Level Parallelism Part 1

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

History of the Intel 80x86

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EECS 322 Computer Architecture Superpipline and the Cache

TDT4255 Computer Design. Lecture 1. Magnus Jahre

Computer Architecture. Lecture 6.1: Fundamentals of

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

1.3 Data processing; data storage; data movement; and control.

The Role of Performance

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

CMSC 611: Advanced Computer Architecture

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

UCB CS61C : Machine Structures

Instruction Pipelining

The Processor: Instruction-Level Parallelism

Instruction Pipelining

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Introduction to Pipelined Datapath

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

Lecture 2: Performance

Practice Assignment 1

University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz 1 (30 minutes) January 21, 2015

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Instructor Information

CS/COE1541: Introduction to Computer Architecture

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)

Engineering 9859 CoE Fundamentals Computer Architecture

Analysis Report. Number of Multiprocessors 3 Multiprocessor Clock Rate Concurrent Kernel Max IPC 6 Threads per Warp 32 Global Memory Bandwidth

Response Time and Throughput

Performance Measurement (as seen by the customer)

Control Dependence, Branch Prediction

Computer Architecture, EDT030

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

Transcription:

04S1 COMP3211/9211 Computer Architecture Tutorial 1 (Weeks 02 & 03) Solutions Lih Wen Koh (lwkoh@cse) September 14, 2004 Key: SRQ = Stallings, Review Question; SP = Stallings Problem; P = Patterson & Hennessy Exercise Warmup Exercises Q3. [P2.1] We wish to compare the performance of two different machines: M1 and M2. The following measurements have been make on these machines: program time on M1 time on M2 1 10 seconds 5 seconds 2 3 seconds 4 seconds Which machine is faster for each program and by how much? Recall, A is n times faster than B for program P if CPU Performance A CPU Performance B = Execution T ime B Execution T ime A = n For program 1, M2 is 10 5 For program 2, M1 is 4 3 = 2 times faster than M1. = 1.33 times faster than M2. Q4. [P2.2] Consider the two machines and programs in P2.1. The following additional measurements were made: program instructions executed on M1 instructions executed on M2 1 200 x 10 6 160 x 10 6 Find the instruction execution rate (instructions per second) for each machine when running program 1. Instruction execution rate for M1 = 200 106 instructions 10 secs 1 =20MIPS

Instruction execution rate for M2 = 160 106 instructions =32MIPS 5 secs Q5. [P2.3] If the clock rates of machines M1 and M2 in P2.1 are 200MHz and 300MHz, respectively, find the clock cycles per instruction (CPI) for program 1 on both machines using the data in P2.1 and P2.2. Recall CPU time = instructions program cycles instructions seconds cycles = instruction count CPI clock period So CPI = CPU time instruction count clock period = CPU time clock rate instruction count For program 1, 10 200 106 CPIforM1= 200 10 6 =10 CPIforM2= 5 300 106 160 10 6 =9.375 Q6. [P2.4] Assuming the CPI for program 2 on each machine in Q3 is the same as the CPI for program 1 found in Q5, find the instruction count for program 2 running on each machine using the execution times from Q3. From Q5, for program 1: CPI for M1 = 10; CPI for M2 = 9.375. Given that program 2 has the same CPI for program 1 for both machines M1 and M2. Also, recall CPU time = instruction count CPI clock period So instruction count = CPU time CPI clock period = CPU time clock rate CPI Thus for program 2, 3 200 106 instruction count on M1 = =60 10 6 10 4 300 106 instruction count on M2 = = 128 10 6 9.375 2

Q7. [P2.5] Suppose that M1 in P2.1 costs $10,000 and M2 costs $15,000. If you needed to run the program 1 a large number of times (i.e., if you were concerned with throughput instead of response time), which machine would you buy in large quantities? Why? Buy M2. M2 is faster than M1 by 2 times but cost of M2 is less than twice that of M1. Discussion Questions Q10. [P1.48] What is the approximate relationship between cost and die area? The approximate relationship can be described as Cost = f((die area) x ) for some x. You don t have to determine f, but you can determine x by first writing Dies per wafer = f((die area) y ) Yield= f((die area) z ) and then examining how these two equations impact the first one (you need to figure out what y and z are). What implications does this have for designers? We can write Cost per die = Dies per wafer = Cost per wafer Dies per wafer yield Wafer area Die area Yield = 1 (1 + Defect per area Die area/2) 2 So y = 1; z = 2; x =3. Q11. [P2.10] Consider two different implementations, M1 and M2, of the same instruction set. There are four classes of instructions (A, B, C, and D) in the instruction set. M1 has a clock rate of 500 MHz. The average number of cycles for each instruction class on M1 is as follows: 3

class CPI for this class A 1 B 2 C 3 D 4 M2 has a clock rate of 750 MHz. The average number of cycles for each instruction class on M2 is as follows: class CPI for this class A 2 B 2 C 4 D 4 Assume that peak performance is defined as the fastest rate that a machine can execute an instruction sequence chosen to maximize that rate. What are the peak performances of M1 and M2 expressed as instructions per second? Peak Performance = M1: Choose instructions from class A (lowest CPI). Peak performance = 500 106 1 = 500 MIPS M2: Choose instructions from either class A or B. Peak performance = 750 106 2 = 375 MIPS Cycles per second Cycles to execute fastest instructions Q12. [P2.13] Consider two different implementations, M1 and M2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 400 MHz, and M2 has clock rate of 200 MHz. The average number of cycles for each isntruction class on M1 and M2 is given in the following table: Class CPI on M1 CPI on M2 C1 Usage C2 Usage Third-party usage A 4 2 30 % 30 % 50 % B 6 4 50 % 20 % 30 % C 8 3 20 % 50 % 20 % The table also contains a summary of how three different compilers use the instructions set. C1 is a compiler produced by the makers of M1, C2 is a compiler produced by the makers of M2, and the other compiler is a 4

third- party product. Assume that each compiler uses the same number of instructions for a given program but that the instruction mix is as described in the table. (a) Using C1 on both M1 and M2, how much faster can the makers of M1 claim that M1 is compared with M2? Using C1 compiler, CPI for M1 = 0.3 4+0.5 6+0.2 8=5.8 CPI for M2 = 0.3 2+0.5 4+0.2 3=3.2 Recall CPU time = instruction count CPI clock period Let I be the instruction count for a given program. Then Performance of M2 Performance of M1 = I 3.2 (200 106 ) 1 I 5.8 (400 10 6 =1.103 ) 1 M1 is faster than M2 by 1.103 times. (b) Using C2 on both M2 and M1, how much faster can the makers of M2 claim that M2 is compared with M1? Using C2 compiler, CPI for M1 = 0.3 4+0.2 6+0.5 8=6.4 CPI for M2 = 0.3 2+0.2 4+0.5 3=2.9 Let I be the instruction count for a given program. Then Performance of M1 Performance of M2 = I 6.4 (400 106 ) 1 I 2.9 (200 10 6 =1.103 ) 1 M2 is faster than M1 by 1.103 times. (c) If you purchase M1, which compiler would you use? (d) If you purchase M2, which compiler would you use? Using third party compiler, CPI for M1 = 0.5 4+0.3 6+0.2 8=5.4 CPI for M2 = 0.5 2+0.3 4+0.2 3=2.8 Execution time for a program is directly proportional to the CPI for the program. Thus we would like to choose the compiler that gives the lowest 5

CPI for a particular machine. For both M1 and M2, the third party compiler gives the lowest CPI. (e) Which machine would you purchase if we assume that all other criteria are identical, including costs? Using third-party compiler, Let I be the instruction count for a given program. Then Performance of M2 Performance of M1 = I 2.8 (200 106 ) 1 I 5.4 (400 10 6 =1.037 ) 1 M1 is faster than M2 by 1.037 times. So we purchase machine M1. Q13. [P2.15] We are interested in two implementations of a machine, one with and one without special floating-point hardware. Consider a program, P, with the following mix of operations: floating-point multiply 10 % floating-point add 15 % floating-point divide 5 % integer instructions 70 % Machine MFP (Machine with Floating Point) has floating-point hardware and can therefore implement the floating-point operations directly. It requires the following number of clock cycles for each instruction class: floating-point multiply 6 floating-point add 4 floating-point divide 20 integer instructions 2 Machine MNFP (Machine with No Floating Point) has no floating-point hardware and so must emulate the floating-point operations using integer instructions. The integer instructions all take 2 clock cycles each. The number of integer instructions needed to implement each of the floatingpointoperationsisasfollows: floating-point multiply 30 floating-point add 20 floating-point divide 50 Both machines have a clock rate of 1000 MHz. ratings for both machines. Recall Find the native MIPS MIPS rating = Instruction count Clock rate = Execution time 106 CPI 10 6 6

CPI for MFP = 0.1*6 + 0.15*4 + 0.05*20 + 0.7*2 = 3.6 CPI for MNFP = 2 (All floating-point instructions have been replaced by integer instructions.) Thus, MIPS rating for MFP = 1000 3.6 = 278 MIPS rating for MFP = 1000 2 = 500 Q14. [P3.14] When designing memory systems, it becomes useful to know the frequency of memory reads versus writes as well as the frequency of accesses for instructions versus data. Using the average instruction mix information for MIPS for the progrqam gcc, find the following: Instruction gcc (freq) Arithmetic 48 % Data transfer 33 % Conditional branch 17 % Jump 2 % (a) The percentage of all memory accesses that are for data (vs. instructions). Given that 33% of instructions are data accesses, so P ercentage of data memory accesses = # Data accesses # Instructions +#Data accesses 100% = 0.33I I+0.33I 100% = 24.81% (b) The percentage of all memory accesses that are reads (vs. writes). Assume that two-thirds of data transfers are loads. Given that 33% of instructions are data accesses; and two thirds of data accesses are loads, so Percentage of read memory accesses = # Instructions + # Data loads # Instructions + # Data accesses 100% = I + 0.33I 2 3 I + 0.33I 100% = 91.73% Q15. [P3.16] Suppose we have made the following measurements of average CPI for instructions: 7

Instruction Average CPI gcc (freq) spice (freq) Arithmetic 1.0 clock cycles 48 % 50 % Data transfer 1.4 clock cycles 33 % 41 % Conditional branch 1.7 clock cycles 17 % 8 % Jump 1.2 clock cycles 2 % 1 % Compute the effective CPI for MIPS. Average the instruction frequencies for gcc and spice to obtain the instruction mix. Effective CPI for gcc = 1.0*0.48 + 1.4*0.33 + 1.7*0.17 + 1.2*0.02 = 1.255 Effective CPI for spice = 1.0*0.5 + 1.4*0.41 + 1.7*0.08 + 1.2*0.01 = 1.222 8