Performance, Power, Die Yield. CS301 Prof Szajda

Similar documents
Response Time and Throughput

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

The Computer Revolution. Classes of Computers. Chapter 1

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Chapter 1. The Computer Revolution

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

Outline Marquette University

CS3350B Computer Architecture CPU Performance and Profiling

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

TDT4255 Computer Design. Lecture 1. Magnus Jahre

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Performance of computer systems

COMPUTER ARCHITECTURE AND OPERATING SYSTEMS (CS31702)

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CSE2021 Computer Organization. Computer Abstractions and Technology

Rechnerstrukturen

Chapter 1. and Technology

Chapter 1. Computer Abstractions and Technology

Computer Organization & Assembly Language Programming (CSE 2312)

Transistors and Wires

ECE 486/586. Computer Architecture. Lecture # 3

ECE331: Hardware Organization and Design

COMPUTER ORGANIZATION AND DESIGN

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

An Introduction to Parallel Architectures

The Role of Performance

Designing for Performance. Patrick Happ Raul Feitosa

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 1. Computer Abstractions and Technology

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

CO Computer Architecture and Programming Languages CAPL. Lecture 15

EIE/ENE 334 Microprocessors

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Computer Architecture

Copyright 2012, Elsevier Inc. All rights reserved.

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

EECS4201 Computer Architecture

CMSC 611: Advanced Computer Architecture

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

LECTURE 1. Introduction

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Computer Architecture. Chapter 1 Part 2 Performance Measures

Reporting Performance Results

Computer Architecture. What is it?

CpE 442 Introduction to Computer Architecture. The Role of Performance

Lecture 2: Performance

Syllabus. Chapter 1. Course Goals. Course Information. Course Goals. Course Information

Lecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining

CPE300: Digital System Architecture and Design

Fundamentals of Quantitative Design and Analysis

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

Instructor Information

ECE369: Fundamentals of Computer Architecture

Review: latency vs. throughput

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Advanced Computer Architecture (CS620)

Course web site: teaching/courses/car. Piazza discussion forum:

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Chapter 1. Computer Abstractions and Technology. Jiang Jiang

Performance Analysis

Performance. February 12, Howard Huang 1

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner. Topic 1: Introduction

Computer Organization and Design: The Hardware/Software Interface by Patterson and Hennessy

Lec 25: Parallel Processors. Announcements

CS 110 Computer Architecture

ECE 154A. Architecture. Dmitri Strukov

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

1.6 Computer Performance

Quiz for Chapter 1 Computer Abstractions and Technology

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

LECTURE 1. Introduction

ECE 588/688 Advanced Computer Architecture II

CS654 Advanced Computer Architecture. Lec 2 - Introduction

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

45-year CPU Evolution: 1 Law -2 Equations

Mestrado em Informática

TDT 4260 lecture 2 spring semester 2015

COMPUTER ORGANIZATION AND DESI

Practice Assignment 1

Computer Architecture. R. Poss

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Transcription:

Performance, Power, Die Yield CS301 Prof Szajda

Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm

Performance Metrics (How do we compare two machines?)

What to Measure? Which airplane has the best performance? 4

Performance One size does not fit all Depends on application domain w Scientific computing w Graphics w Databases w General-Purpose desktop w Beware of designing to benchmark! Depends on technology characteristics w DRAM speed and capacity, chip size, etc.

Which Metric Do We Use? Response or execution time w Difference between start and end time w Individual user cares most about this Throughput w Total amount of work done in given time w Frequently used for servers and clusters How are these affected by w Replacing processor with faster version? w Adding more processors?

Execution Time Shorter execution time is better Allows comparison between 2 machines

Relative Performance X is n times faster than Y Example: w Machine A takes 10s to run program w Machine B takes 15s to run same program w What is the performance ratio?

Different Time Values Execution time w Wall-clock, response, or elapsed time Includes everything (processing,i/o, OS overhead, etc)! w Determines system performance CPU time w Time spent executing code for this task only Does not include I/O or time-sharing w Comprises user CPU time and system CPU time Difference programs are affected differently by CPU and system performance w man time 90.7u 12.9s 2:39 65% User: 90.7 sec System: 12.9 sec Elapsed time: 2 min 39 sec

Clock Cycles Instead of expressing time in seconds, use clock cycles Clock w Determines when events take place w Runs at constant rate (ex. 1 GHz) w Easy to convert between clock rate and seconds Clock rate = 1 / Clock Cycle 500 MHz = 1 / (2 ns) 1 ns = 10-9 s

CPU Clocking Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250 10 12 s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0 109 Hz Chapter 1 Computer Abstractions and Technology

CPU Time Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count Chapter 1 Computer Abstractions and Technology

CPU Time Example Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 clock cycles How fast must Computer B clock be? Chapter 1 Computer Abstractions and Technology

Instruction Count and CPI Instruction Count for a program Determined by program, ISA and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI Average CPI affected by instruction mix Chapter 1 Computer Abstractions and Technology

CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? A is faster by this much Chapter 1 Computer Abstractions and Technology

Application Characteristics Determine the mix of different instruction types w Integer arithmetic w Logical operations w Floating point arithmetic w Loads and stores Different applications have different CPI because of different instruction mixes

CPI in More Detail If different instruction classes take different numbers of cycles Weighted average CPI Relative frequency Chapter 1 Computer Abstractions and Technology

CPI Example Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 1: IC = 5 Clock Cycles = 2 1 + 1 2 + 2 3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4 1 + 1 2 + 1 3 = 9 Avg. CPI = 9/6 = 1.5 Chapter 1 Computer Abstractions and Technology

Performance Summary The BIG Picture Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, T c Chapter 1 Computer Abstractions and Technology

Amdahl s Law How much speedup do you get from an enhancement? Speedup = Execution time w/o enhancement Execution time w/ enhancement Based on w Fraction of time enhancement used w Improvement in enhanced mode Exec new = Exec old ((1-fraction enh ) + fraction enh ) Speedup enh

Pitfall: Amdahl s Law Improving an aspect of a computer and expecting a proportional improvement in overall performance 1.10 Fallacies and Pitfalls Example: multiply accounts for 80s/100s How much improvement in multiply performance to get 5 overall? Can t be done! Corollary: make the common case fast Chapter 1 Computer Abstractions and Technology

Review Question Your machine has a clock rate of 2.4GHz. How long is the clock cycle?

Review Questions Suppose you are given the following: w Machine A 1 GHz Average CPI = 1.6 Instructions = 1.7 Billion w Machine B 3.3 GHz Average CPI = 6.1 Instructions = 2 Billion Which machine is faster? By how much?

Review Questions What is the average CPI for a machine with the following CPIs on an application with the following instruction frequency? Type Frequency CPI Arithme(c 0.45 1 Memory 0.3 8 Control 0.2 3 Mult/Div 0.05 5

Review Questions What factors must be included when comparing the relative performance of two machines?

Amdahl s Law Exec new = Exec old ((1-fraction enh ) + fraction enh ) Speedup enh Suppose you have an enhancement that makes function 10x faster. Speedup if used 5% of the time? Speedup if used 40% of the time?

Review Questions What is the equation for execution time? What does Amdahl s Law say?

Benchmarks Programs specifically used to measure performance Hope is that it is representative of how computer will be used Examples w SPEC Integer and Floating Point w MediaBench w MineBench w TPC

SPEC CPU Benchmark Programs used to measure performance Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC) Develops benchmarks for CPU, I/O, Web, SPEC CPU2006 Elapsed time to execute a selection of programs Negligible I/O, so focuses on CPU performance Normalize relative to reference machine Summarize as geometric mean of performance ratios CINT2006 (integer) and CFP2006 (floating-point) Chapter 1 Computer Abstractions and Technology

CINT2006 for Intel Core i7 920 Chapter 1 Computer Abstractions and Technology

Recent Concern: Power Trends 1.7 The Power Wall In CMOS IC technology 30 5V 1V 1000 Chapter 1 Computer Abstractions and Technology

Tricks to Increase Power Attach large cooling devices Turn off parts of chips not used in given clock cycle w Can increase power to 300 watts... w...but these and other ways all prohibitively expensive for desktop computers. So... 32

More Recent Approaches: Chip Multiprocessors Reasons for change w Limited opportunities to improve single thread performance w Power w On-chip communication latencies

Tapering Processor Performance

Uniprocessor Performance 1.8 The Sea Change: The Switch to Multiprocessors Constrained by power, instruction-level parallelism, memory latency Chapter 1 Computer Abstractions and Technology

Multiprocessors Multicore microprocessors More than one processor per chip Requires explicitly parallel programming Compare with instruction level parallelism Hardware executes multiple instructions at once Hidden from the programmer Hard to do Programming for performance Load balancing Optimizing communication and synchronization Chapter 1 Computer Abstractions and Technology

Concluding Remarks Cost/performance is improving Due to underlying technology development Hierarchical layers of abstraction In both hardware and software Instruction set architecture The hardware/software interface Execution time: the best performance measure Power is a limiting factor Use parallelism to improve performance 1.9 Concluding Remarks Chapter 1 Computer Abstractions and Technology