The Role of Performance

Similar documents
Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Performance, Power, Die Yield. CS301 Prof Szajda

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

Designing for Performance. Patrick Happ Raul Feitosa

Response Time and Throughput

Performance Analysis

Computer Architecture. Chapter 1 Part 2 Performance Measures

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Performance. February 12, Howard Huang 1

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

CPE300: Digital System Architecture and Design

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

CMSC 611: Advanced Computer Architecture

Reporting Performance Results

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Course web site: teaching/courses/car. Piazza discussion forum:

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Quiz for Chapter 1 Computer Abstractions and Technology

CS 110 Computer Architecture

Performance of computer systems

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Vector and Parallel Processors. Amdahl's Law

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

CpE 442 Introduction to Computer Architecture. The Role of Performance

Instructor Information

Advanced processor designs

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

CS3350B Computer Architecture CPU Performance and Profiling

Cycles Per Instruction For This Microprocessor

Computer Architecture s Changing Definition

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture

Computer Architecture. What is it?

Outline Marquette University

T T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.

LECTURE 1. Introduction

Chapter 14 Performance and Processor Design

Lecture 3: Evaluating Computer Architectures. How to design something:

CSE 141 Summer 2016 Homework 2

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek.

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance

CS/COE1541: Introduction to Computer Architecture

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

The Von Neumann Computer Model

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

The Computer Revolution. Classes of Computers. Chapter 1

Parts A and B both refer to the C-code and 6-instruction processor equivalent assembly shown below:

Announcements. 1 week extension on project. 1 week extension on Lab 3 for 141L. Measuring performance Return quiz #1

CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic

CS430 Computer Architecture

Performance analysis basics

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

ECE369: Fundamentals of Computer Architecture

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

Multiple Issue ILP Processors. Summary of discussions

CS61C : Machine Structures

Performance measurement. SMD149 - Operating Systems - Performance and processor design. Introduction. Important trends affecting performance issues

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

Solutions for Chapter 4 Exercises

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

Evaluating Computers: Bigger, better, faster, more?

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

Outline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

Identifying Performance Limiters Paulius Micikevicius NVIDIA August 23, 2011

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Chapter 2 Logic Gates and Introduction to Computer Architecture

Assessing and Understanding Performance

COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS

Lecture 4: RISC Computers

1.3 Data processing; data storage; data movement; and control.

Reduced Instruction Set Computer

The Processor: Instruction-Level Parallelism

Processor (IV) - advanced ILP. Hwansoo Han

CS 152 Computer Architecture and Engineering

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

UCB CS61C : Machine Structures

LECTURE 10. Pipelining: Advanced ILP

1.6 Computer Performance

GRE Architecture Session

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Transcription:

Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture The Role of Performance

What is performance? A set of metrics that allow us to compare two different hardware platforms Facts about hardware Measured, reported, summarized 2 2

Why should we care? Besides the obvious what to buy The Informed Consumer Concerned from an architectural standpoint Again the why of the machine Why is one instruction faster? Why does some hardware feature affect the speed? But also the how How can we make the computer be faster? And finally, what are the trade-offs? 3 3

Performance Definition What does this mean? Computer A has better performance than B Well, it depends Response time (latency) Amount of time to finish your job Called Execution Time Finishing more jobs, faster (bandwidth) Amount of time to finish many jobs Throughput 4 4

Keep it simple... For now, we focus on execution time Thus we say: Performance = or, by substitution: 1 Execution time Performance X Performance Y = Execution time Y Execution time X When we say X is n times faster than Y Performance X Performance Y = n 5 5

Relative Performance Two machines X & Y Machine X runs a program in 10 seconds Machine Y runs the program in 15 seconds Which has better performance? Relative performance Performance X Performance Y = Execution time Y 15 = Execution time X 10 = 1.5 X is 1.5 times faster than Y Or is Y 1.5 times slower than X 6 6

Actually measure performance Everything based on time Time to finish a job Time to finish all of the jobs Easiest is total time (elapsed time) But this is not totally accurate CPU Time I/O Time 7 7

CPU Time Two categories User time Time spent on MY job System time Time spent on operating system overhead On Unix time command On Windows Can't do it easily 8 8

System Time Includes all operating system overhead Switching tasks Managing interrupts Should this be included in performance? Yes? Different OS, different overhead No? OS is part of the execution time 9 9

Other Metrics Performance is not just time Think about measures based on computer work All computers have a clock Based on an oscillator that vibrates at a constant rate Provides distinct time intervals Clock cycle Ticks Clock periods 10 10

Clocks Length of the clock period Called Clock cycle (e.g. 2 ns) Measured in seconds/cycle Or Clock rate (500 MHZ) Inverse of the clock cycle Measured in cylces/second See 500 MHZ = 500,000,000 cycles/second 2 ns 2/1,000,000,000 second 500 x 10 6 cycles second * 2 1,000 x 10 6 = 1 11 11

Ponder Consider 1 ns clock cycle means 1 GHZ (1,000,000,000) Nanosecond = 1 x 10-9 Light travels 1 foot 3 GHZ is 3,000,000,000 cycles in a second 3 x 10 9 cycles/second Cycle time of 333 picoseconds or.333 ns Amount of time it takes light to travel 4 inches 12 12

Relating Metrics Only concerned with CPU time Users are concerned with time Designers with clock cycles Relate the two CPU Execution Time We could also say CPU Execution Time = # clock cycles for a program = * Clock cycle time # clock cycles for a program Clock rate 13 13

Look at an example A Program runs in 10 seconds Want to improve the clock rate to take 6 seconds Currently running at 400 MHZ 10 seconds = # clock cycles for a program 400 x 10 6 cycles second # clock cycles for a program = 10 seconds x cycles 400 x 10 6 second = 4000 x 10 6 cycles 14 14

The Solution Changing the clock rate would... So Make the processor use 1.2 times as many clock cycles CPU Time = 1.2 x Old cyle count New clock rate 6 seconds = 1.2 x 4000 x 106 cycles New clock rate New clock rate = 1.2 x 4000 x 106 cycles 6 seconds = 800 x 106 cycles second = 800 MHZ 15 15

Program Clock Cycles Where do they come from? Every instruction takes time to execute Time taken is a clock cycle Refine our equations CPU clock cycles = # Instructions executed * Average clock cycle per instruction 16 16

CPI CPI is Clock cycles Per Instruction Average time each instruction takes to execute Add takes less time than multiply or load Averaged over an entire program Can compare two different implementation of the same architecture 17 17

More CPI CPI depends on different factors Memory system Processor structure ISA implementation Example: A program that performs many load/store operations will have a high CPI Takes a certain number of cycles to access memory 18 18

MIPS (no, not the one you think) Millions of Instructions Per Second Inverse of CPI Not constant Faster machines have higher MIPS rate Also depends on instruction mix MIPS = Instruction count Execution time x 10 6 19 19

FLOPS Floating-point operations per seconds Rate depends on machine implementation Options: No FPU Microprogrammed FPU Hardwired FPU Supports more functions (sine, cosine,...) MFLOPS (Mega-FLOPS) FLOPS x 10 6 Operations: Add/ Subtract/ Multiply/ Divide/... Precision Single / Double 20 20

Finally, the equations This all works out to the following metrics CPU Time = Instruction count * CPI * Clock cycle time Or CPU Time = Instruction count * CPI Clock rate 21 21

That is great Can play with distinct parts to find balance But how do we get the numbers? Look at the parts CPU time: Run the program Clock cycle time: Provided by the manufacturer Instruction Count:???? CPI:???? 22 22

Instruction count Couple of ways Use a simulator Such as SPIM A Software Profiler Counts instructions as they execute Hardware counters Included on newer processors Instruction count is implementation agnostic 23 23

But CPI... CPI however, depends on a lot more Like the memory system Can be done with Detailed simulation (more detailed than SPIM) Calculating the number of instructions and their individual cycle counts CPU clock cycles = S (CPI i x C i ) CPI is implementation specific 24 24

Instruction Frequencies Rank 1 2 3 4 5 6 7 8 9 10 instruction load conditional branch compare store add and sub move register -register call return Total Average time executed 22% 20% 16% 12% 8% 6% 5% 4% 1% 1% 96% 25 25

Lets be compiler writers Given The following facts, supplied by HW designer Instruction class CPI for this instruction class A 1 B 2 C 3 2 code sequences for a particular machine Code sequence 1 2 Instruction counts A B C 2 1 2 4 1 1 26 26

Finding the solution Required: Which code sequence executes the most instructions? Solution: Sequence 1 Executes (2+1+2 = 5) instruction CPU clock cycles for 1: 10 cycles (2x1)+(1x2)+(2x3) = 10 cycles Sequence 2 Executes (4+ 1+ 1 =6) instructions CPU clock cycles for 2: 9 cycles (4x1)+(1x2)+(1x3) = 9 cycles 27 27

The Example Completed CPI = CPU Clock cycles / Instruction count CPI for Sequence 1: 10/5 = 2 CPI CPI for 2: 9/6 = 1.5 CPI Code sequence 2 is faster Has more instructions Requires fewer cycles 28 28

Software for Comparison How to select software for comparison? Needs to accurately emulate daily workload Needs to be run on different machines We want real world applications Actual real-world target workload Pros: Represents real-world problems Cons: Difficult to run and measure Very specific Not portable to other situations Difficult to speed up execution 29 29

Which Benchmarks? We use Benchmarks Programs selected to measure performance More specific and Portable Should be class of apps users use most Engineers run math-intensive applications Developers- compilers and document processors Large suites Prevents trivial optimizations that negate the benchmark More likely to represent real usage 30 30

Types of benchmarks Full application Like compilers, games, streaming media Pros Portable, Widely used Cons Less representative 31 31

Types of Benchmarks Kernel Short loops with specific instructions Pros Easy to run Good in design phase Cons Tailored to specific task Can't compare across machines 32 32

Types of Benchmarks Micro Benchmarks Pros Good for beginning design Can compile and simulate easily Cons Can lead to misleading results Might not give actual performance 33 33

Types of Applications Suites of Applications Programs with specific input Pros Good indicator of compiler tech and performance Cons Needs updating to match current SW applications 34 34

Problems with Benchmarks Compiler optimizations Vendors optimize specifically for benchmarks Compilers optimized for benchmark only Special switches Little Code, Lots of Execution Modern compilers can optimize loops to almost nothing Further, small amount of code will reside in cache 35 35

What IBM Did PowerPC 550 in 1991 8 0 0 7 0 0 S P E C p e rf o r m a n c e r a t io 6 0 0 5 0 0 4 0 0 3 0 0 2 0 0 1 0 0 0 g c c e s p re s s o s p ic e d o d u c n a s a 7 li e q n to tt m a trix 3 0 0 fp p p p to m c a tv B e n c h m a rk C o m p ile r E n h a n c e d c o m p ile r 36 36

Select a suite Industry has settled on SPEC System Performance Evaluation Cooperative Created by a group of companies in 1989 SPEC95 The latest release of SPEC benchmark Eighteen application benchmarks (with inputs) reflecting a technical computing workload Eight integer Ten floating-point intensive Must run with standard compiler flags 37 37

Comparing results SPEC Ratio Normalized results Divide execution time on a SPARC station by time on the measured machine 10 9 S P E C i n t 8 7 6 5 4 3 2 1 0 50 100 Clock Rate (MHZ) 150 200 250 Pentium Pentium Pro 38 38

Now, to comparing performance Not quite We have benchmarks We decide on response/throughput How do we summarize performance? Perfomance = Execution Run the suite a number of times Take the arithmetic mean of all of the runs The ratio of the two is the relative performance 39 39

Increasing Performance How do we increase performance? Make improvements to Implementation Architecture Start from Scratch 40 40

Increasing Performance Implementation Improvements Faster clock with unchanged architecture Advantage: Old programs can run on the new machine => A major selling point Architectural Improvements Add new instructions & new registers Advantage: Old programs should continue to run Disadvantage: Software must be recompiled to take advantage of the new features 41 41

Starting over RISC architecture (1980 s) IA-64 (Now) Really just RISC Advantage: Freedom of change and design current needs Disadvantage: Everything must be done from scratch Old programs can t be used 42 42

Increasing Performance- Pitfalls Amdahl's Law Improving one aspect by a percentage does not increase the entire machine by that percentage MIPS (not that one) can be used Instructions implemented differently Can vary inversely with perfomance Arithmetic mean predicts performance Normalization causes skewing of information 43 43

Final thoughts Execution time is only valid metric Performance measurements should reflect execution time Can't design hardware for performance without considering cost Exception: High performance computers for scientific computing Crays or the Virginia Tech Computer Cluser 44 44

Virginia Technology G5 45 45

Final thoughts Other extreme is low-cost design Cost takes precedence over performance IBM PC, embedded computers Cost / performance design, in which the Designer balance cost against performance Cost determined by Components Labor Research & development Sales & marketing Profit margin 46 46

Final thoughts Performance increases come from: Increases in clock rate Without adverse CPI affects Improvements in processor organization that lower CPI Compiler enhancements that lower CPI and/or instruction count 47 47