Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

Similar documents
Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

The Computer Revolution. Classes of Computers. Chapter 1

Chapter 1. The Computer Revolution

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

Response Time and Throughput

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Performance, Power, Die Yield. CS301 Prof Szajda

TDT4255 Computer Design. Lecture 1. Magnus Jahre

ECE369: Fundamentals of Computer Architecture

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

COSC3330 Computer Architecture Lecture 7. Datapath and Performance

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

An Introduction to Parallel Architectures

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

ECE331: Hardware Organization and Design

Computer Organization & Assembly Language Programming (CSE 2312)

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

COMPUTER ORGANIZATION AND DESIGN

Rechnerstrukturen

CSE2021 Computer Organization. Computer Abstractions and Technology

The Role of Performance

Course web site: teaching/courses/car. Piazza discussion forum:

Chapter 1. and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 1. Computer Abstractions and Technology

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

1.6 Computer Performance

Chapter 1. Computer Abstractions and Technology

CPE300: Digital System Architecture and Design

Performance Analysis

Quiz for Chapter 1 Computer Abstractions and Technology

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

Designing for Performance. Patrick Happ Raul Feitosa

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

EIE/ENE 334 Microprocessors

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

ECE 486/586. Computer Architecture. Lecture # 3

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Computer Architecture. Chapter 1 Part 2 Performance Measures

Syllabus. Chapter 1. Course Goals. Course Information. Course Goals. Course Information

Outline Marquette University

TDT 4260 lecture 2 spring semester 2015

Computer Organization and Design: The Hardware/Software Interface by Patterson and Hennessy

Computer System. Performance

Chapter 1. Computer Abstractions and Technology. Jiang Jiang

Computer Performance Evaluation: Cycles Per Instruction (CPI)

CpE 442 Introduction to Computer Architecture. The Role of Performance

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

CSE 141 Summer 2016 Homework 2

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

CS 2506 Computer Organization II Test 1

CMSC 611: Advanced Computer Architecture

COMPUTER ARCHITECTURE AND OPERATING SYSTEMS (CS31702)

Performance. February 12, Howard Huang 1

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance

Homework 4 - Solutions (Floating point representation, Performance, Recursion and Stacks) Maximum points: 80 points

Lecture 3: Evaluating Computer Architectures. How to design something:

LECTURE 1. Introduction

CS430 Computer Architecture

Three major components of Embedded Systems

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

Evaluating Computers: Bigger, better, faster, more?

Review: latency vs. throughput

CS3350B Computer Architecture CPU Performance and Profiling

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek.

CSE2021 Computer Organization

CSE2021 Computer Organization

Quantifying Performance EEC 170 Fall 2005 Chapter 4

University of Toronto Faculty of Applied Science and Engineering

Introduction to Pipelined Datapath

Lecture 2: Performance

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

Practice Assignment 1

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

Performance of computer systems

CS 351 Exam 1, Fall 2010

CS/COE1541: Introduction to Computer Architecture

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

CMSC 611: Advanced Computer Architecture

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Measuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks

Transcription:

Defining Which airplane has the best performance? 1 Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 100 200 300 400 500 Passenger Capacity 0 2000 4000 6000 8000 10000 Cruising Range (miles) Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 500 1000 1500 Cruising Speed (mph) 0 100000 200000 300000 400000 Passengers x mph

Computer : TIME, TIME, TIME Response Time (latency): how long it takes to perform a task 2 Throughput: total work done per unit time For now, we will focus on response time = 1 Execution Time

Execution Time Elapsed Time - counts everything (disk and memory accesses, I/O, etc.) - a useful number, but often not good for comparison purposes 3 CPU time - doesn't count I/O or time spent running other programs - can be broken up into system time, and user time Our focus: user CPU time - time spent executing the lines of code that are "in" our program

Relative 1 = Execution Time 4 Relative = = X Execution Time Execution Time Y Y X Example: time taken to run a program 10s on A, 15s on B Execution Time B / Execution Time A = 15s / 10s = 1.5 So A is 1.5 times faster than B

CPU Clocking Operation of digital hardware are governed by a constant-rate clock. 5 Clock (cycles) Data transfer and computation Update state Clock period Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250 10 12 s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0 10 9 Hz

CPU Time 6 CPU Time = CPU Clock Cycles Clock Cycle Time CPU Clock Cycles = Clock Rate improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count

CPU Time Example Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 clock cycles How fast must Computer B clock be? 7 Clock Rate B Clock CyclesB 1.2 Clock Cycles = = CPU Time 6s B A Clock Cycles = CPU Time Clock Rate A A A = 10s 2GHz = 20 10 9 Clock Rate B 9 9 1.2 20 10 24 10 = = = 4GHz 6s 6s

How many cycles are required for a program? Could assume that number of cycles equals number of instructions: 8 1st instruction 2nd instruction 3rd instruction 4th 5th 6th... time This assumption is incorrect, different instructions take different amounts of time on different machines. Why? hint: remember that these are machine instructions, not lines of C code

Now that we understand cycles A given program will require - some number of instructions (machine instructions) - some number of cycles - some number of seconds 9 We have a vocabulary that relates these quantities: - cycle time (seconds per cycle) - clock rate (cycles per second) - CPI (cycles per instruction) a floating point intensive application might have a higher CPI - MIPS (millions of instructions per second) this would be higher for a program using simple instructions

10 is determined by execution time Do any of the other variables equal performance? # of cycles to execute program? # of instructions in program? # of cycles per second? average # of cycles per instruction? average # of instructions per second? Common pitfall: thinking one of the variables is indicative of performance when it really isn t.

CPI Example 11 Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? CPU Time = Instruction Count CPI = Cycle Time A A A = I 2.0 250ps = I 500ps A is faster CPU Time = Instruction Count CPI = Cycle Time B B B = I 1.2 500ps = I 600ps CPU TimeB I 600ps = = 1.2 CPU Time I 500ps A by this much

CPI in More Detail 12 If different instruction classes take different numbers of cycles Clock Cycles = n i= 1 (CPIi Instruction Counti) Weighted average CPI CPI = Clock Cycles Instruction Count = n i= 1 CPI i Instruction Counti Instruction Count Relative frequency

CPI Example 13 Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 1: IC = 5 Clock Cycles = 2 1 + 1 2 + 2 3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4 1 + 1 2 + 1 3 = 9 Avg. CPI = 9/6 = 1.5

# of Instructions Example 14 A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence?

MIPS example 15 Two different compilers are being tested for a 4 GHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 2 million Class C instructions. The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time?

Amdahl's Law 16 Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement ) Example: Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? How about making it 5 times faster? Principle: Make the common case fast

Example 17 Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floatingpoint instructions? We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark?

Remember 18 is specific to a particular program/s - Total execution time is a consistent summary of performance For a given architecture performance increases come from: - increases in clock rate (without adverse CPI affects) - improvements in processor organization that lower CPI - compiler enhancements that lower CPI and/or instruction count - algorithm/language choices that affect instruction count Pitfall: expecting improvement in one aspect of a machine s performance to affect the total performance

Summary 19 CPU Time Instructions Clock cycles Seconds = Program Instruction Clock cycle depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, T c