Performance. February 12, Howard Huang 1

Similar documents
The Role of Performance

Performance, Power, Die Yield. CS301 Prof Szajda

Response Time and Throughput

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

Course web site: teaching/courses/car. Piazza discussion forum:

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

PC I/O. May 7, Howard Huang 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

More advanced CPUs. August 4, Howard Huang 1

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

CS232: Computer Architecture II

Caches Concepts Review

Introduction to I/O. April 30, Howard Huang 1

CS 110 Computer Architecture

Processor Performance and Parallelism Y. K. Malaiya

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

Outline Marquette University

Announcements. 1 week extension on project. 1 week extension on Lab 3 for 141L. Measuring performance Return quiz #1

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

CMSC 611: Advanced Computer Architecture

Is Intel s Hyper-Threading Technology Worth the Extra Money to the Average User?

CS 152 Computer Architecture and Engineering

CST 337, Fall 2013 Homework #7

CpE 442 Introduction to Computer Architecture. The Role of Performance

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

CPE300: Digital System Architecture and Design

Performance Measurement (as seen by the customer)

Lectures More I/O

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

Performance Analysis

UCB CS61C : Machine Structures

Review: latency vs. throughput

Evaluating Computers: Bigger, better, faster, more?

CS 101, Mock Computer Architecture

Designing for Performance. Patrick Happ Raul Feitosa

The Computer Revolution. Classes of Computers. Chapter 1

Topics/Assignments. Class 10: Big Picture. What s Coming Next? Perspectives. So Far Mostly Programmer Perspective. Where are We? Where are We Going?

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Lecture 3: Evaluating Computer Architectures. How to design something:

Computer Architecture. What is it?

LECTURE 1. Introduction

CS654 Advanced Computer Architecture. Lec 2 - Introduction

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.

1.3 Data processing; data storage; data movement; and control.

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

Lecture 23. Finish-up buses Storage

Reporting Performance Results

Computer Architecture

Quiz for Chapter 1 Computer Abstractions and Technology

Performance of computer systems

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Chapter 14 Performance and Processor Design

Practice Assignment 1

Lecture 1: Introduction

Caching Basics. Memory Hierarchies

Fundamentals of Quantitative Design and Analysis

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Understanding Computers. Hardware Terminology

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Performance Metrics. 1 cycle. 1 cycle. Computer B performs more instructions per second, thus it is the fastest for this program.

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

T T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.

Advanced processor designs

Advanced Computer Architecture (CS620)

Transistors and Wires

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

www-inst.eecs.berkeley.edu/~cs61c/

UC Berkeley CS61C : Machine Structures

ECE331: Hardware Organization and Design

Multiple Instruction Issue. Superscalars

Performance measurement. SMD149 - Operating Systems - Performance and processor design. Introduction. Important trends affecting performance issues

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

CS3350B Computer Architecture CPU Performance and Profiling

Lecture 4: Instruction Set Architectures. Review: latency vs. throughput

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James

June 2004 Now let s find out exactly what we ve bought, how to shop a new system and how to speed up an existing PC!

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

UCB CS61C : Machine Structures

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

Cache Memory and Performance

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

Copyright 2012, Elsevier Inc. All rights reserved.

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Week 6 out-of-class notes, discussions and sample problems

i960 Microprocessor Performance Brief October 1998 Order Number:

Transcription:

Performance Today we ll try to answer several questions about performance. Why is performance important? How can you define performance more precisely? How do hardware and software design affect performance? How do you measure performance in the real world? We ll use the ideas from today to evaluate the different CPU designs that are coming up in the next several weeks. February 12, 2003 2001-2003 Howard Huang 1

Performance for users Users want to evaluate all of the seemingly conflicting claims in the industry, and get the most for their money. Q: Why do end users need a new performance metric? A: End users who rely only on megahertz as an indicator for performance do not have a complete picture of PC processor performance and may pay the price of missed expectations. February 12, 2003 Performance 2

Performance for developers Programmers are interested in making faster software. Software can take advantage of new hardware features, such as extra instructions or graphics processors for games. At the same time, programmers should avoid writing slower code that relies on complex instructions or frequent memory accesses. Hardware designers want to build faster systems. You can add features that are likely to be used in new software and to improve performance, like SSE2 instructions in the Pentium 4. You could also find ways to speed up existing systems and software, perhaps by increasing the cache size or bus speed. Fast software and fast hardware go hand in hand. February 12, 2003 Performance 3

Execution time The most intuitive measure of performance is just execution time, or how long you have to wait for a program to finish running. Our focus for the next several weeks will be on CPU processing time, but there are other factors in determining execution time too. Memory and cache accesses Input and output from disks, video cards, networks, etc. Other processes in a multitasking system We ll look at some of these later in the course. February 12, 2003 Performance 4

The components of execution time Execution time can be divided into two parts. User time is spent running the application program itself. System time is when the application calls operating system code. The distinction between user and system time is not always clear, especially under different operating systems. The Unix time command shows both. salary.125 > time distill 05-examples.ps Distilling 05-examples.ps (449,119 bytes) 10.8 seconds (0:11) 449,119 bytes PS => 94,999 bytes PDF (21%) 10.61u 0.98s 0:15.15 76.5% User time Total time (including other processes) System time CPU usage = (User + System) / Total February 12, 2003 Performance 5

Throughput Another important measurement is throughput, or how many tasks can be performed in some amount of time. Throughput is especially important for servers. How many web pages can be served per minute? How many database transactions can be processed per second? Modern multitasking operating systems always trade some execution time in exchange for throughput each individual program runs slower overall, but many programs can run together. Execution time and throughput are related. Improving execution time also improves throughput. It s possible to improve throughput but not execution time. February 12, 2003 Performance 6

Execution time vs. throughput Let s say you re shopping at that new Wal-Mart by the airport. There is one register open, and the cashier takes two minutes per customer. The execution time is 2 minutes. The throughput would be 30 customers per hour. After some training, the cashier can process customers in 1.5 minutes. The execution time is now improved to 1.5 minutes. The throughput also improves, to 40 customers per hour. Or they could open three checkout lines, all with untrained personnel: It still takes 2 minutes for each customer to check out. But with three lines, there are 90 customers leaving per hour! February 12, 2003 Performance 7

Measuring performance We ll measure performance according to execution time. A lower execution time is better, so we can define the performance of a computer system X on a program P as follows. Performance X,P = 1 Execution time X,P We can say that system X is n times faster than Y on program P if: Performance X,P = Execution time Y,P = n Performance Y,P Execution time X,P This is equivalent to saying that Y is n times slower than X. February 12, 2003 Performance 8

Clock cycle time There are three equally important components of execution time. One cycle is the minimum time it takes the CPU to do any work. The clock cycle time or clock period is just the length of a cycle. The clock rate, or frequency, is the reciprocal of the cycle time. Of course, the lower the cycle time and the higher the clock rate, the faster a given architecture can run. Some examples illustrate some typical frequencies. A 500MHz processor has a cycle time of 2ns. A 2GHz (2000MHz) CPU has a cycle time of just 0.5ns. February 12, 2003 Performance 9

CPI Another important component is the average number of clock cycles per instruction, or CPI, for a particular machine and program. The CPI depends on the actual instructions appearing in the program a floating-point intensive application might have a higher CPI than an integer-based program. It also depends on the CPU implementation. For example, a Pentium can execute the same instructions as an older 80486, but faster. In CS231 we assumed each instruction took one cycle, so we had CPI = 1. The CPI is often higher in reality because of memory or I/O accesses, or more complex instructions. The CPI can also be lower than 1, if you consider multiprocessors or superscalar architectures that execute many instructions at once. February 12, 2003 Performance 10

Instructions executed Finally, we need to consider the number of instructions in a program. We are not interested in the static instruction count, or how many lines of code are in a program. Instead we care about the dynamic instruction count, or how many instructions are actually executed when the program runs. For example, there are three lines of code below, but the number of instructions executed would be 2001. li $a0, 1000 Ostrich: sub $a0, $a0, 1 bne $a0, $0, Ostrich Programs that execute more instructions may take more time. February 12, 2003 Performance 11

Computing the execution time Now we can express the CPU time more precisely. CPU time X,P = Instructions executed P CPI X,P Clock cycle time X Make sure you have the units straight! Seconds Program = Instructions Program Clock cycles Instructions Seconds Clock cycle We can use this formula to determine how various changes to a program or machine will affect performance. February 12, 2003 Performance 12

Programs, hardware and compilers CPU time X,P = Instructions executed P CPI X,P Clock cycle time X Execution time and performance depend on both the particular system and the particular program being executed. How does the machine X affect performance? Its implementation of the instruction set helps to determine CPI. The processor s frequency will determine the clock cycle time. How does the program P affect performance? It determines the number of instructions executed. The types of those instructions influence the CPI. A good compiler is critical for optimizing program code! February 12, 2003 Performance 13

Comparing ISA-compatible processors Let s compare the performances two 8086-based processors. An 800MHz AMD Duron, with a CPI of 1.2 for an MP3 compressor. A 1GHz Pentium III with a CPI of 1.5 for the same program. Compatible processors implement identical instruction sets and will use the same executable files, with the same number of instructions. But they implement the ISA differently, which leads to different CPIs. CPU time AMD,P = Instructions P CPI AMD,P Cycle time AMD = Instructions P 1.2 1.25ns = 1.5 Instructions P ns CPU time P3,P = Instructions P CPI P3,P Cycle time P3 = Instructions P 1.5 1ns = 1.5 Instructions P ns So the execution time and performance are the same! February 12, 2003 Performance 14

Comparing clock rates How about comparing a 2.5 GHz and a 3.0 GHz Pentium 4? Selling the same processor at different clock rates is common. Both processors will run the same code and have the same CPI. CPU time 2.5,P = Instructions P CPI 2.5,P Cycle time 2.5 = Instructions P CPI 2.5,P 0.40ns CPU time 3.0,P = Instructions P CPI 2.5,P Cycle time 3.0 = Instructions P CPI 2.5,P 0.33ns Performance 3.0,P CPU time 2.5,P 0.40 = = = 1.2 Performance 2.5,P CPU time 3.0,P 0.33 As you might expect, the 3.0 GHz chip is 20% faster than the 2.5 GHz. Remember this is only CPU time, and it ignores all other factors! February 12, 2003 Performance 15

Comparing compilers Let s compare two different compilers for the same system. The better compiler generates programs that need 5% fewer instructions and have a 10% lower CPI than the base compiler. CPU time X,Base = Instructions P CPI X,Base Cycle time X CPU time X,Opt = (0.95 Instructions P ) (0.90 CPI X,Base ) Cycle time X = 0.86 CPU time Base,P Performance X,Opt CPU time X,Base 1.00 = = Performance X,Base CPU time X,Opt 0.86 = 1.17 So there are many ways to speed up a system. February 12, 2003 Performance 16

Amdahl s Law Amdahl s Law states that optimizations are limited in their effectiveness. Execution time after improvement = Time affected by improvement Amount of improvement + Time unaffected by improvement For example, doubling the speed of floating-point operations sounds like a great idea. But if only 10% of the program execution time T involves floating-point code, then the overall performance improves by just 5%. Execution time after improvement = 0.10 T 2 + 0.90 T = 0.95 T A corollary of this law is that we should always try to make the common case fast enhance the parts of the program that are used most often, so time affected by improvement is as large as possible. February 12, 2003 Performance 17

Benchmarking What programs should we use to measure real-world performance? Ideally we d test each computer with our favorite programs. But there are too many computers and programs out there! Instead people often rely on a few benchmark programs in an attempt to characterize the performance of systems. A good benchmark should reflect the performance of other applications too in particular, it should have a realistic mix of instructions. Some common benchmarks include: Adobe Photoshop for image processing BAPCo SYSmark for office applications Unreal Tournament 2003 for 3D games February 12, 2003 Performance 18

Synthetic benchmarks A synthetic benchmark is a program whose only purpose is to measure performance. They are usually small and easy to port to different CPUs and operating systems, so they are convenient for comparing systems. However, there are many disadvantages too. They re small enough that it s easy for compiler writers and CPU designers to cheat and make improvements that apply only to that particular benchmark. Synthetic benchmarks may not contain a realistic instruction mix and may not reflect performance of typical applications. Many synthetic benchmarks are in use today. SiSoft Sandra measures general system performance. Futuremark 3DMark03 tests Windows 3D game performance (it does include code from actual games). February 12, 2003 Performance 19

Some actual benchmarks SiSoft Sandra tests several system components. http://www.sisoftware.co.uk It produces pretty bar graphs. February 12, 2003 Performance 20

Performance of many programs The best way to see how a system performs for a variety of programs is to just show the execution times of all of the programs. Here are execution times for several different Photoshop 5.5 tasks, from http://www.tech-report.com February 12, 2003 Performance 21

Summarizing performance Summarizing performance with a single number can be misleading just like summarizing four years of school with a single GPA! If you must have a single number, you could sum the execution times. This example graph displays the total execution time of the individual tests from the previous page. A similar option is to find the average of all the execution times. For example, the 800MHz Pentium III (in yellow) needed 227.3 seconds to run 21 programs, so its average execution time is 227.3 21 = 10.82 seconds. A weighted sum or average is also possible, and lets you emphasize some benchmarks more than others. February 12, 2003 Performance 22

Summary Performance is one of the most important criteria in judging systems. There are two main measurements of performance. Execution time is what we ll focus on. Throughput is important for servers and operating systems. Our main performance equation explains how performance depends on several factors related to both hardware and software. CPU time X,P = Instructions executed P CPI X,P Clock cycle time X It can be hard to measure these factors in real life, but this is a useful guide for comparing systems and designs. Amdahl s Law tell us how much improvement we can expect from specific enhancements. The best benchmarks are real programs, which are more likely to reflect common instruction mixes. February 12, 2003 Performance 23