Designing for Performance. Patrick Happ Raul Feitosa

Size: px
Start display at page:

Download "Designing for Performance. Patrick Happ Raul Feitosa"

Transcription

1 Designing for Performance Patrick Happ Raul Feitosa

2 Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance 2

3 Which one would you choose? Name INTEL CORE I7 4770K Number of cores 4 Number of threads 8 Frequency 3.5 GHz Turbo Frequency 3.9 GHz Data width 64-bit TDP 84 W Release June, 2013 Name AMD FX 9590 Number of cores 8 Number of threads 8 Frequency 4.7 GHz Turbo Frequency 5 GHz Data width 64-bit TDP 220 W Release July, 2013 Designing for Performance 3

4 Outline Performance Assessment Amdahl s Law Designing for Performance 4

5 Designing new systems Cost Size Reliability Security Power Consumption Performance Designing for Performance 5

6 CPU operations Seek and decode instructions Load and Store data Logic and Arithmetic Operations Clock pulse Designing for Performance 6

7 Performance factors Clock speed or clock rate ( f ) Expressed in multiples of Hz. Clock cycle or clock tick one increment, or pulse, of the clock. Clock time ( τ ) time between consecutive pulses. 1 f Designing for Performance 7

8 Performance factors Clock speed Usually multiple clock cycles are required per instruction. The amount of work implied by one instruction varies considerably. Pipelining gives simultaneous execution of instructions. So, clock speed is not the whole story! Designing for Performance 8

9 Performance factors CPI - average number of cycles per instructions I i - number of machine instructions of type i executed by a program. CPI i - number of cycles per instruction of type i. I c - number of machine instructions executed by a program n I c I i i1 CPI n i 1 CPI I c i I i Designing for Performance 9

10 Performance factors T processor time needed to execute a program. T I c CPI a refinement yields T where I p ( mk) c p is the number of processor cycles to decode + execute the instruction m is the number of memory references needed k is the ratio between memory cycle time and processor cycle time. Designing for Performance 10

11 Performance factors Instruction Execution Rate Expressed in Millions of instructions (MIPS) or floating point operations (MFLOPS) per second. Heavily dependent on instruction set, compiler design, processor implementation, cache & memory hierarchy. Ic MIPS T CPI f.10 6 Designing for Performance 11

12 Performance factors System attributes affecting the performance factors Instruction set architecture I c p m k τ Compiler technology VLSI technology Processor implementation Cache and memory hierarchy Designing for Performance 12

13 Performance factors System attributes affecting the performance factors I c p m k τ Instruction set architecture! Compiler technology VLSI technology Processor implementation Cache and memory hierarchy! Designing for Performance 13

14 Exercise 1 A program involves the execution of 2 million instructions on a 400 MHz processor. CPI and proportion of four instruction types are given below. Compute the average CPI: instruction type CPI instruction mix Arithmetic and logic 1 60% Load/store with cache hit 2 18% Branch 4 12% Load/store with cache miss 8 10% average CPI is CPI = 0.6+ (2 0.18) + (4 0.12) + (8 0.1) = 2.24 Designing for Performance 14

15 Exercise 2 Consider two hardware implementations M 1 and M 2 of the same instruction set. There are three instruction classes: F, I and N. The M 1 clock rate is 600 Mhz. The clock cycle of M 2 is 2 ns. The average CPI for these three instruction classes are Class CPI of M 1 CPI of M 2 Comments F floating-point I integer N non-arithmetic a) Compute the peak performance for M 1 and M 2 in MIPS. b) If 50% of the instruction executed in a given program belong to class N and the other are equally distributed between F and I, which is the fastest machine and by which factor? Designing for Performance 15

16 Exercise 2 c) A designer of M 1 plan to change the project to improve performance. Assuming the information in (b). Which of the options below should be more beneficial? 1. Use a FPU twice as fast (CPI=2,5 for class F). 2. Add a second ALU to reduce the CPI for integer operations to Use a faster logic that allows a clock rate of 750 MHz keeping the same CPI values? d) The CPI given above include a cache miss that occurs 5 times per 100 executed instructions. Each cache miss imply in a 10 cycles penalty. The forth redesign option consists of using a larger instruction cache so as to reduce the miss ratio from 5% to 3%. Compare this alternative with the options before. e) Characterize application programs that can be executed faster in M 1 than in M 2, i. e., discuss the instruction composition of such applications. Hint: Let x, y and 1-x-y the fraction of instructions belonging to classes F, I and N respectively. Designing for Performance 16

17 Exercise 3 Consider two codes produced by two compiler for the same source program. The instructions of the machine that will execute these codes can be divided in class A (CPI=1) and B (CPI=2). The number of executed instruction of each class is given below Class compiler 1 compiler 2 comments A 600M 400M CPI=1 B 400M 400M CPI=2 a) Compute the execution time for both codes assuming a clock rate = 1 GHz. b) Which compiler produce the most efficient code and by which factor? c) Which code execute at the highest MIPS? Designing for Performance 17

18 Benchmarks: motivation A high level language statement A=B+C /* assume all quantities in main memory */ Compiled code on CISC Compiled code on RISC add mem(b),mem(c),mem(a) load mem(b),reg(1); load mem(c),reg(2); add reg(1),reg(2),reg(3); store reg(3),mem(a); Designing for Performance 18

19 Benchmarks: definition Programs designed to test performance Written in high level language portable Represents style of task (systems, numerical, commercial) Easily measured and widely distributed E.g. System Performance Evaluation Corporation (SPEC) CPU2006 for computation bound 17 floating point programs in C, C++, Fortran 12 integer programs in C, C++ 3 million lines of code Graphics, High Performance, Web, Servers, Designing for Performance 19

20 Averaging Results By running m different benchmark one obtains a reliable comparison. The overall instruction execution rate may be expressed by the m m 1 R H RA R m i 1 m i1 R arithmetic or harmonic mean, where R i is the instruction execution rate of the i-th benchmark i1 i Designing for Performance 20

21 SPEC speed metric Spec benchmarks do not concern with instruction execution rates Base runtime defined for each benchmark using reference machine Speed metric is ratio of reference time to system run time Tref i execution time for benchmark i on reference machine Tsut i execution time of benchmark i on test system Designing for Performance 21

22 Averaging SPEC metrics Overall performance calculated by averaging ratios for all 12 integer benchmarks Use geometric mean Appropriate for normalized numbers such as ratios Designing for Performance 22

23 SPEC Rate Metric Measures throughput or rate of a machine carrying out a number of tasks Multiple copies of benchmarks run simultaneously Typically, same as number of processors Ratio is calculated as follows: Tref i reference execution time for benchmark i N number of copies running simultaneously Tsut i elapsed time from start of execution of all N programs until completion of all copies of program Again, a geometric mean is calculated Designing for Performance 23

24 Exercise 4 The table below shows the execution times, in seconds, for 3 different processors. benchmark processor X Y Z a) Compute the arithmetic mean value for each system using X as the reference machine and then using Y as the reference machine. b) Compute the geometric mean value for each system using X as the reference machine and then using Y as the reference machine. Which is the most realistic result? Designing for Performance 24

25 Which one would you choose? Name INTEL CORE I7 4770K Number of cores 4 Number of threads 8 Frequency 3.5 GHz Turbo Frequency 3.9 GHz Data width 64-bit TDP 84 W Release June, 2013 Name AMD FX 9590 Number of cores 8 Number of threads 8 Frequency 4.7 GHz Turbo Frequency 5 GHz Data width 64-bit TDP 220 W Release July, 2013 Designing for Performance 25

26 Ref: CPUBoss Link Designing for Performance 26

27 Outline Performance Assessment Amdahl s Law Designing for Performance 27

28 Amdahl s Law Estimate the potential speed up of program using multiple processors Fraction p of code parallelizable with no scheduling overhead Fraction (1 - p) of code inherently serial T is total execution time for program on single processor N is number of processors that fully exploit parallel portions of code Gene Amdahl Speedup time to execute program on a single processor time to execute program on N parallel processors T(1 p) Tp Tp T(1 p) N (1 1 p) p N Designing for Performance 28

29 Amdahl s Law Conclusions Code needs to be parallelizable/parallelized! p small, parallel processors has little effect. N, speedup bound by 1/(1 p). Speedup is bound, giving diminishing returns for more processors. Speedup time to execute program on a single processor time to execute program on N parallel processors T(1 p) Tp Tp T(1 p) N (1 1 p) p N Designing for Performance 29

30 Amdahl s Law Exercise 5 A program spends 60% of its execution time with floating point operations. 90% of them are executed in parallelizable loops. When the code is parallelized coordination and synchronization between parts make the part not involving floating-point operations 10% longer. a) Find the improvement in terms of execution time achieved by doubling the speed of the floating-point unit. b) Find the improvement in terms of execution time achieved by using two processors having the same speed and structure as the original one c) What would be the improvement if both changes are implemented. Designing for Performance 30

31 Amdahl s Law Generalization for any design improvement Speedup Execution time before enhancemen t Execution time after enhancemen t Suppose that the enhancement affects the execution p of the total runtime before enhancement, and that the speed up brought by this enhancement is SU p. Thus Speedup 1 p 1 f SU p Designing for Performance 31.

32 Amdahl s Law Generalized Amdahl s Law example Suppose that a task consumes 40% of the time with floating-point operations. A new FPU has speedup K. Then the overall speedup is Speedup So, the maximum speedup is K Designing for Performance 32

33 Homeworks Exercise 6 A processor is used for an application where 30 %, 25% and 10% of the processing time is spent with floating-point addition, multiplication and division, respectively. For a new processor version, 3 alternatives are being considered, all of them involving nearly the same design and implementation cost. Which one should be selected? a) Redesign the adder making it twice as fast as the older one. b) Redesign the multiplier making it three times as fast as the older one c) Redesign the divider making it ten times as fast as the older one. Designing for Performance 33

34 Homeworks Exercise 7: T is the average processing time of a computer operating at frequency f. Instructions are grouped in 3 types, as shown below. Instruction type CPI Floating point arithmetic 10 Integer arithmetic 5 Non- arithmetic 2 Typically a program executes the same proportion of instructions from all three groups/types. Compute the MIPS and the new execution time, if the FPU becomes twice as fast. Designing for Performance 34

35 Homeworks Exercise 8: Let f 1 and f 2 be the operation frequency of processors P 1 and P 2 respectively. Assume that two compilers generate different executable codes for the same source program which may be executed byp 1 as well as byp 2. The codes have the characteristics given below: Instruction type CPI Proportion compiler 1 Proportion compiler 2 Floating point arithmetic % 30 % Integer arithmetic 5 30 % 10 % Non- arithmetic 2 50 % 60 % Compute the ratio f 1 /f 2 for which the processing time in P 1 executing code 1 equals the processing time of P 2 executing code 2. Designing for Performance 35

36 Homeworks Exercise 9: The code of an application can be separated in a sequential part (S) and in a parallelizable part (P). The number of executed instructions of type P is twice as many as of type S, when the application runs in a single processor. When the application runs in multiple processors the number of instructions of type S increases in 10%. Consider the following two configurations: A) Single processor machine operating with frequency 2f. B) Four processors machine operating with frequency f. a) Determine the limit ratio r between the CPI of instructions of type P and type S (r=cpi P /CPI S ), for which the configuration A) is faster than configuration B). b) Compute the upper limit for the speed up that can be achieved using multiple processors without changing the operation frequency. Designing for Performance 36

37 Homeworks Exercise 10: The following table shows the execution times, in seconds, for five different benchmark programs on three machines. Benchmark Processor R M Z E F H I K a) Compute the speed metric for each processor for each benchmark, normalized to machine R using equation given in slide 21. Then compute the arithmetic mean value b) Repeat a) using M as reference machine. Which machine is the slowest based on each of the preceding two calculations? c) Repeat the calculations of parts(a) and (b) using the geometric mean, defined in slide 22. Which machine is the slowest based on the two calculations?. Designing for Performance 37

38 Text Book References The topics are covered in Stallings - sections 2.2, 2.3 and 2.5 Tanenbaum - section 8.4 Parhami - chapter 4 Designing for Performance 38

39 Designing for Performance END 15-17, 24,28,31-25 Designing for Performance 39

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program. Performance COMP375 Computer Architecture and dorganization What is Good Performance Which is the best performing jet? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of

More information

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Quiz for Chapter 1 Computer Abstractions and Technology 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [15 points] Consider two different implementations, M1 and

More information

Quiz for Chapter 1 Computer Abstractions and Technology

Quiz for Chapter 1 Computer Abstractions and Technology Date: Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

Performance, Power, Die Yield. CS301 Prof Szajda

Performance, Power, Die Yield. CS301 Prof Szajda Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the

More information

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002, Chapter 1 Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Course Goals Introduce you to design principles, analysis techniques and design options in computer architecture

More information

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain. Defining Performance Performance 1 Which airplane has the best performance? Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 100 200 300

More information

T T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.

T T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N. A1: Architecture (25 points) Consider these four possible branch predictors: (A) Static backward taken, forward not taken (B) 1-bit saturating counter (C) 2-bit saturating counter (D) Global predictor

More information

The Role of Performance

The Role of Performance Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture The Role of Performance What is performance? A set of metrics that allow us to compare two different hardware

More information

CSE 141 Summer 2016 Homework 2

CSE 141 Summer 2016 Homework 2 CSE 141 Summer 2016 Homework 2 PID: Name: 1. A matrix multiplication program can spend 10% of its execution time in reading inputs from a disk, 10% of its execution time in parsing and creating arrays

More information

CS430 Computer Architecture

CS430 Computer Architecture CS430 Computer Architecture Spring 2015 Spring 2015 CS430 - Computer Architecture 1 Chapter 14 Processor Structure and Function Instruction Cycle from Chapter 3 Spring 2015 CS430 - Computer Architecture

More information

Response Time and Throughput

Response Time and Throughput Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing

More information

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533 Lecture 2: Computer Performance Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533 Performance and Cost Purchasing perspective given a collection of machines, which has the - best performance?

More information

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, ) Performance IC220 Slide Set #5B: Performance (Chapter 1: 1.6, 1.9-1.11) Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 15

CO Computer Architecture and Programming Languages CAPL. Lecture 15 CO20-320241 Computer Architecture and Programming Languages CAPL Lecture 15 Dr. Kinga Lipskoch Fall 2017 How to Compute a Binary Float Decimal fraction: 8.703125 Integral part: 8 1000 Fraction part: 0.703125

More information

Course web site: teaching/courses/car. Piazza discussion forum:

Course web site:   teaching/courses/car. Piazza discussion forum: Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start

More information

Performance of computer systems

Performance of computer systems Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

More information

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747 Defining Which airplane has the best performance? 1 Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 100 200 300 400 500 Passenger Capacity

More information

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm Computer Performance He said, to speed things up we need to squeeze the clock Reread Chapter 1.4-1.9 Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm L15 Computer Performance 1 Why Study Performance?

More information

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process Performance evaluation It s an everyday process CS/COE0447: Computer Organization and Assembly Language Chapter 4 Sangyeun Cho Dept. of Computer Science When you buy food Same quantity, then you look at

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Classes of Computers Personal computers General purpose, variety of software

More information

Vector and Parallel Processors. Amdahl's Law

Vector and Parallel Processors. Amdahl's Law Vector and Parallel Processors. Vector processors are processors which have special hardware for performing operations on vectors: generally, this takes the form of a deep pipeline specialized for this

More information

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Performance CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Performance Complex question How fast is the processor? How fast your application runs?

More information

04S1 COMP3211/9211 Computer Architecture Tutorial 1 (Weeks 02 & 03) Solutions

04S1 COMP3211/9211 Computer Architecture Tutorial 1 (Weeks 02 & 03) Solutions 04S1 COMP3211/9211 Computer Architecture Tutorial 1 (Weeks 02 & 03) Solutions Lih Wen Koh (lwkoh@cse) September 14, 2004 Key: SRQ = Stallings, Review Question; SP = Stallings Problem; P = Patterson & Hennessy

More information

Computer Architecture. Chapter 1 Part 2 Performance Measures

Computer Architecture. Chapter 1 Part 2 Performance Measures Computer Architecture Chapter 1 Part 2 Performance Measures 1 Topics Designing for Performance Performance Measures 2 Designing for Performance (1) Support-Demand Cycle Computer Performance Demands Supports

More information

Computer Organization. 8 th Edition. Chapter 2 p Computer Evolution and Performance

Computer Organization. 8 th Edition. Chapter 2 p Computer Evolution and Performance William Stallings Computer Organization and Architecture 8 th Edition Chapter 2 p Computer Evolution and Performance ENIAC - background Electronic Numerical Integrator And Computer Eckert and Mauchly University

More information

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010 ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010 This homework is to be done individually. Total 9 Questions, 100 points 1. (8

More information

Practice Assignment 1

Practice Assignment 1 German University in Cairo Practice Assignment 1 Dr. Haytham El Miligi Ahmed Hesham Mohamed Khaled Lydia Sidhom Assume that in a given program: 1 Performance Metrics 1.1 IPC and CPI 1.1.1 1. 15% of instructions

More information

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance This Unit CIS 501 Computer Architecture Metrics Latency and throughput Reporting performance Benchmarking and averaging Unit 2: Performance Performance analysis & pitfalls Slides developed by Milo Martin

More information

Computer Architecture

Computer Architecture Computer Architecture Architecture The art and science of designing and constructing buildings A style and method of design and construction Design, the way components fit together Computer Architecture

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Number Representation 09212011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Logic Circuits for Register Transfer

More information

Performance Metrics. 1 cycle. 1 cycle. Computer B performs more instructions per second, thus it is the fastest for this program.

Performance Metrics. 1 cycle. 1 cycle. Computer B performs more instructions per second, thus it is the fastest for this program. Parallel Programming WS6 HOMEWORK (with solutions) Performance Metrics Basic concepts. Performance. Suppose we have two computers A and B. Computer A has a clock cycle of ns and performs on average 2 instructions

More information

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time? The bottom line: Performance Car to Bay Area Speed Passengers Throughput (pmph) Ferrari 3.1 hours 160 mph 2 320 Measuring and Discussing Computer System Performance Greyhound 7.7 hours 65 mph 60 3900 or

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Sample Midterm I Questions Israel Koren ECE568/Koren Sample Midterm.1.1 1. The cost of a pipeline can

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science

More information

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance Necessity of evaluation computer performance MEASURING COMPUTER PERFORMANCE For comparing different computer performances User: Interested in reducing the execution time (response time) of a task. Computer

More information

CS3350B Computer Architecture CPU Performance and Profiling

CS3350B Computer Architecture CPU Performance and Profiling CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada

More information

GRE Architecture Session

GRE Architecture Session GRE Architecture Session Session 2: Saturday 23, 1995 Young H. Cho e-mail: youngc@cs.berkeley.edu www: http://http.cs.berkeley/~youngc Y. H. Cho Page 1 Review n Homework n Basic Gate Arithmetics n Bubble

More information

ECE 341. Lecture # 15

ECE 341. Lecture # 15 ECE 341 Lecture # 15 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 19, 2014 Portland State University Pipelining Structural Hazards Pipeline Performance Lecture Topics Effects of Stalls and Penalties

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1 Lecture - 4 Measurement Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1 Acknowledgements David Patterson Dr. Roger Kieckhafer 9/29/2009 2 Computer Architecture is Design and Analysis

More information

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Computer Performance Evaluation: Cycles Per Instruction (CPI) Computer Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: where: Clock rate = 1 / clock cycle A computer machine

More information

Review: latency vs. throughput

Review: latency vs. throughput Lecture : Performance measurement and Instruction Set Architectures Last Time Introduction to performance Computer benchmarks Amdahl s law Today Take QUIZ 1 today over Chapter 1 Turn in your homework on

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

The Computer Revolution. Classes of Computers. Chapter 1

The Computer Revolution. Classes of Computers. Chapter 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition 1 Chapter 1 Computer Abstractions and Technology 1 The Computer Revolution Progress in computer technology Underpinned by Moore

More information

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Cache Optimization Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Cache Misses On cache hit CPU proceeds normally On cache miss Stall the CPU pipeline

More information

CMSC411 Fall 2013 Midterm 1

CMSC411 Fall 2013 Midterm 1 CMSC411 Fall 2013 Midterm 1 Name: Instructions You have 75 minutes to take this exam. There are 100 points in this exam, so spend about 45 seconds per point. You do not need to provide a number if you

More information

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ELE 375 Final Exam Fall, 2000 Prof. Martonosi ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

Engineering 9859 CoE Fundamentals Computer Architecture

Engineering 9859 CoE Fundamentals Computer Architecture Engineering 9859 CoE Fundamentals Computer Architecture Introduction Dennis Peters 1 Fall 2007 1 Based on notes from Dr. R. Venkatesan Course Details Classes Monday, Wednesday, Friday 9 10 EN-4033 Course

More information

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction. CMSC 4 Practice Exam w/answers General instructions. Be complete, yet concise. You may leave arithmetic expressions in any form that a calculator could evaluate.. CPU performance Suppose we have the following

More information

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics ,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also

More information

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1 Program Performance Metrics The parallel run time (Tpar) is the time from the moment when computation starts to the moment when the last processor finished his execution The speedup (S) is defined as the

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

Computer Architecture Homework Set # 1 COVER SHEET Please turn in with your own solution

Computer Architecture Homework Set # 1 COVER SHEET Please turn in with your own solution CSCE 614 (Fall 2017) Computer Architecture Homework Set # 1 COVER SHEET Please turn in with your own solution Eun Jung Kim Write your answers on the sheets provided. Submit with the COVER SHEET. If you

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Cache Memory and Performance

Cache Memory and Performance Cache Memory and Performance Cache Performance 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP)

More information

Performance Analysis

Performance Analysis Performance Analysis EE380, Fall 2015 Hank Dietz http://aggregate.org/hankd/ Why Measure Performance? Performance is important Identify HW/SW performance problems Compare & choose wisely Which system configuration

More information

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example 1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles

More information

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So Computer Architecture ELEC344 Computer Performance How do you measure performance of a computer? 2 nd Semester, 208-9 Dr. Hayden Kwok-Hay So How do you make a computer fast? Department of Electrical and

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC

More information

High Performance Computing

High Performance Computing High Performance Computing CS701 and IS860 Basavaraj Talawar basavaraj@nitk.edu.in Course Syllabus Definition, RISC ISA, RISC Pipeline, Performance Quantification Instruction Level Parallelism Pipeline

More information

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs. Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and

More information

Measuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks

Measuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks Measuring Performance Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks Why Measure Performance? Performance tells you how you are doing and whether things can be improved appreciably When

More information

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 Name: Solutions (please print) 1-3. 11 points 4. 7 points 5. 7 points 6. 20 points 7. 30 points 8. 25 points Total (105 pts):

More information

Computer Organization and Architecture William Stallings 8th Edition. Chapter 2 Computer Evolution and Performance

Computer Organization and Architecture William Stallings 8th Edition. Chapter 2 Computer Evolution and Performance Computer Organization and Architecture William Stallings 8th Edition Chapter 2 Computer Evolution and Performance BRIEF HISTORY OF COMPUTERS The First Generation: Vacuum Tubes ENIAC - background Electronic

More information

CPSC614: Computer Architecture

CPSC614: Computer Architecture CPSC614: Computer Architecture E.J. Kim Texas A&M University Computer Science & Engineering Department Assignment 1, Due Thursday Feb/9 Spring 2017 1. A certain benchmark contains 195,700 floating-point

More information

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras Performance, Cost and Amdahl s s Law Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras Performance-

More information

EECS 322 Computer Architecture Superpipline and the Cache

EECS 322 Computer Architecture Superpipline and the Cache EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:

More information

Basics of Performance Engineering

Basics of Performance Engineering ERLANGEN REGIONAL COMPUTING CENTER Basics of Performance Engineering J. Treibig HiPerCH 3, 23./24.03.2015 Why hardware should not be exposed Such an approach is not portable Hardware issues frequently

More information

Chapter 18 - Multicore Computers

Chapter 18 - Multicore Computers Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Computer Architecture and Organization (CS-507)

Computer Architecture and Organization (CS-507) Computer Architecture and Organization (CS-507) Muhammad Zeeshan Haider Ali Lecturer ISP. Multan ali.zeeshan04@gmail.com https://zeeshanaliatisp.wordpress.com/ Lecture 2 Computer Evolution and Performance

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Lecture 2: Performance

Lecture 2: Performance Lecture 2: Performance Today s topics: Technology wrap-up Performance trends and equations Reminders: YouTube videos, canvas, and class webpage: http://www.cs.utah.edu/~rajeev/cs3810/ 1 Important Trends

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Performance Analysis Instructor: Haidar M. Harmanani Spring 2018 Outline Performance scalability Analytical performance measures Amdahl

More information

CPU Performance Pipelined CPU

CPU Performance Pipelined CPU CPU Performance Pipelined CPU Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H Chapters 1.4 and 4.5 In a major matter, no details are small French Proverb 2 Big Picture:

More information

Chapter 1. The Computer Revolution

Chapter 1. The Computer Revolution Chapter 1 Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications feasible Computers

More information

CpE 442 Introduction to Computer Architecture. The Role of Performance

CpE 442 Introduction to Computer Architecture. The Role of Performance CpE 442 Introduction to Computer Architecture The Role of Performance Instructor: H. H. Ammar CpE442 Lec2.1 Overview of Today s Lecture: The Role of Performance Review from Last Lecture Definition and

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Lecture 26: Parallel Processing. Spring 2018 Jason Tang Lecture 26: Parallel Processing Spring 2018 Jason Tang 1 Topics Static multiple issue pipelines Dynamic multiple issue pipelines Hardware multithreading 2 Taxonomy of Parallel Architectures Flynn categories:

More information

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture Chapter 2 Note: The slides being presented represent a mix. Some are created by Mark Franklin, Washington University in St. Louis, Dept. of CSE. Many are taken from the Patterson & Hennessy book, Computer

More information

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Sept. 5 th : Homework 1 release (due on Sept.

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC Philosophy CISC Limitations 1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Final Exam - Review Israel Koren ECE568 Final_Exam.1 1. A computer system contains an IOP which may

More information

CS 110 Computer Architecture

CS 110 Computer Architecture CS 110 Computer Architecture Performance and Floating Point Arithmetic Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University

More information

LECTURE 1. Introduction

LECTURE 1. Introduction LECTURE 1 Introduction CLASSES OF COMPUTERS A computer is a device that can be instructed to carry out arbitrary sequences of arithmetic or logical operations automatically. Computers share a core set

More information