PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Similar documents
Advanced Computer Architecture (CS620)

Copyright 2012, Elsevier Inc. All rights reserved.

Transistors and Wires

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Lecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining

EECS4201 Computer Architecture

Fundamentals of Quantitative Design and Analysis

Performance of computer systems

Course web site: teaching/courses/car. Piazza discussion forum:

Review: latency vs. throughput

CS/EE 6810: Computer Architecture

Lecture 1: Introduction

Lecture 2: Performance

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

ECE 486/586. Computer Architecture. Lecture # 2

TDT 4260 lecture 2 spring semester 2015

Computer Architecture

CSE 502 Graduate Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

The Computer Revolution. Classes of Computers. Chapter 1

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

LECTURE 1. Introduction

PERFORMANCE MEASUREMENT

Fundamentals of Computer Design

Performance, Power, Die Yield. CS301 Prof Szajda

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Computer Architecture s Changing Definition

ECE 154A. Architecture. Dmitri Strukov

LECTURE 1. Introduction

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Response Time and Throughput

Outline Marquette University

PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

CACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Exercise 1 Due 02.November 2010, 12:15pm

TDT4255 Computer Design. Lecture 1. Magnus Jahre

Practice Assignment 1

Lecture 1: CS/ECE 3810 Introduction

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

Introduction to Computer Architecture II

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

MEMORY HIERARCHY DESIGN

Chapter 1. The Computer Revolution

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

Computer Organization & Assembly Language Programming (CSE 2312)

1.13 Historical Perspectives and References

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

CS 152 Computer Architecture and Engineering

CS3350B Computer Architecture CPU Performance and Profiling

Tutorial 11. Final Exam Review

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

DEPARTMENT OF ECE IV YEAR ECE EC6009 ADVANCED COMPUTER ARCHITECTURE LECTURE NOTES

Computer Architecture Lecture 1: Fundamentals of Quantitative Design and Analysis (Chapter 1)

Advanced Computer Architecture Week 1: Introduction. ECE 154B Dmitri Strukov

PARALLEL MEMORY ARCHITECTURE

GRE Architecture Session

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

B649 Graduate Computer Architecture. Lec 2 - Introduction. Slides derived from David Patterson

Performance Measurement (as seen by the customer)

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 2: Figures of Merit and Evaluation Methodologies

THREAD LEVEL PARALLELISM

ELE 455/555 Computer System Engineering. Section 1 Review and Foundations Class 5 Computer System Performance

Lecture 4: Instruction Set Architectures. Review: latency vs. throughput

Designing for Performance. Patrick Happ Raul Feitosa

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

Instructor Information

ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING

ECE 486/586. Computer Architecture. Lecture # 3

Computer Architecture. What is it?

Lec 25: Parallel Processors. Announcements

EITF20: Computer Architecture Part1.1.1: Introduction

Parallelism: The Real Y2K Crisis. Darek Mihocka August 14, 2008

1.3 Data processing; data storage; data movement; and control.

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

5008: Computer Architecture

Introduction. Summary. Why computer architecture? Technology trends Cost issues

CACHE OPTIMIZATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Course overview Computer system structure and operation

Multi-Core Microprocessor Chips: Motivation & Challenges

ILP: COMPILER-BASED TECHNIQUES

Computer Architecture

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101

Chapter 1. Computer Abstractions and Technology

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

ILP: CONTROL FLOW. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Computer Architecture. R. Poss

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Computer Architecture. Introduction. Lynn Choi Korea University

Chapter 1. and Technology

Computer Organization. 8 th Edition. Chapter 2 p Computer Evolution and Performance

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance

Advanced Computer Architecture Week 1: Introduction. ECE 154B Dmitri Strukov

Transcription:

PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

Overview Announcement Sept. 5 th : Homework 1 release (due on Sept. 12 th ) This lecture Technology trends Measuring performance Principles of computer design Power and energy Cost and reliability

Technology Trends (Historical Data) IC logic Technology: on-chip transistor count doubles every 18-24 months (Moore s Law) Transistor density increases by 35% per year Die size increases 10-20% per year DRAM Technology Chip capacity increases 25-40% per year Flash Storage Chip capacity increases 50-60% per year

Technology Trends (Historical Data) Recent Microprocessor Trends Transistor count (1.43x/yr) Core count (1.2-1.43x/yr) Frequency (1.05x/yr) Power (1.04x/yr) 2004 2010 Source: Micron University Symposium

Technology Trends (Historical Data) Recent Microprocessor Trends Transistor count (1.43x/yr) Core count (1.2-1.43x/yr) Performance (1.15x/yr) Frequency (1.05x/yr) Power (1.04x/yr) 2004 2010 Source: Micron University Symposium

Measuring Performance How to measure performance? Latency or response time n The time between start and completion of an event (e.g., milliseconds for disk access) Bandwidth or throughput n The total amount of work done in a given time (e.g., megabytes per second for disk transfer)

Measuring Performance How to measure performance? Latency or response time n The time between start and completion of an event (e.g., milliseconds for disk access) Bandwidth or throughput n The total amount of work done in a given time (e.g., megabytes per second for disk transfer) Which one is better? latency or throughput?

Measuring Performance Which one is better (faster)? Car Bus Delay=10m Capacity=4p Delay=30m Capacity=30p

Measuring Performance Which one is better (faster)? Car Bus Delay=10m Delay=30m Capacity=4p Capacity=30p Throughput=0.4PPM Throughput=1PPM It really depends on your needs (goals).

Measuring Performance What program to use for measuring performance? Benchmarks Suites A set of representative programs that are likely relevant to the user Examples: n SPEC CPU 2017: CPU-oriented programs (for desktops) n SPECweb: throughput-oriented (for servers) n EEMBC: embedded processors/workloads

Summarizing Performance Numbers How to capture the behavior of multiple programs with a single number Comp-A Comp-B Comp-C Prog-1 10 5 25 Prog-2 5 10 20 Prog-3 25 10 25

Summarizing Performance Numbers How to capture the behavior of multiple programs with a single number Comp-A Comp-B Comp-C Prog-1 10 5 25 Prog-2 5 10 20 Prog-3 25 10 25 AM: Arithmetic Mean (good for times and latencies)

Summarizing Performance Numbers How to capture the behavior of multiple programs with a single number Comp-A Comp-B Comp-C Prog-1 1/10 1/5 1/25 Prog-2 1/5 1/10 1/20 Prog-3 1/25 1/10 1/25

Summarizing Performance Numbers How to capture the behavior of multiple programs with a single number Comp-A Comp-B Comp-C Prog-1 1/10 1/5 1/25 Prog-2 1/5 1/10 1/20 Prog-3 1/25 1/10 1/25 HM: Harmonic Mean (good for rates and throughput)

Summarizing Performance Numbers How to capture the behavior of multiple programs with a single number Comp-A Comp-B Comp-C Prog-1 10/10 10/5 10/25 Prog-2 5/5 5/10 5/20 Prog-3 25/25 25/10 25/25

Summarizing Performance Numbers How to capture the behavior of multiple programs with a single number Comp-A Comp-B Comp-C Prog-1 10/10 10/5 10/25 Prog-2 5/5 5/10 5/20 Prog-3 25/25 25/10 25/25 GM: Geometric Mean (good for speedups)

The Processor Performance Clock cycle time (CT = 1/clock frequency) Influenced by technology and pipeline Cycles per instruction (CPI) Influenced by architecture IPC may be used instead (IPC = 1/CPI) Instruction count (IC) Influenced by ISA and compiler CPU time = IC x CPI x CT

Example Problem Find the average CPI of a load/store machine when running an application that results in the following statistics Instruction Type Frequency Cycles Load 20% 2 Store 20% 2 Branch 20% 2 ALU 40% 1

Example Problem Find the average CPI of a load/store machine when running an application that results in the following statistics Instruction Type Frequency Cycles Load 20% 2 Store 20% 2 Branch 20% 2 ALU 40% 1 CPI = 0.2x2 + 0.2x2 + 0.2x2 + 0.4x1 = 1.6

Example Problem Find the average CPI of a load/store machine when running an application that results in the following statistics Instruction Type Frequency Cycles Load 20% 2 Store 20% 2 Branch 20% 2 ALU 40% 1 50% of the branches can be combined with ALU instructions and executed as Branch-ALU fused in 2 cycles. What is the new average CPI?

Example Problem Find the average CPI of a load/store machine when running an application that results in the following statistics Instruction Type Frequency Cycles Load 22% 2 Store 22% 2 Branch 11% 2 ALU 33% 1 Branch-ALU 12% 2 80% of the branches can be combined with ALU instructions and executed as Branch-ALU fused in 2 cycles. What is the new average CPI? CPI = 1.67

The Processor Performance Points to note Performance = 1 / execution time AM(IPCs) = 1 / HM(CPIs) GM(IPCs) = 1 / GM(CPIs)

Speedup vs. Percentage Speedup = old execution time / new execution time Improvement = (new performance - old performance)/old performance My old and new computers run a particular program in 80 and 60 seconds; compute the followings speedup percentage increase in performance reduction in execution time

Speedup vs. Percentage Speedup = old execution time / new execution time Improvement = (new performance - old performance)/old performance My old and new computers run a particular program in 80 and 60 seconds; compute the followings speedup = 80/60 percentage increase in performance reduction in execution time = 33% = 20/80 = 25%

Example Problem A new computer has an IPC that is 20% worse than the old one. However, it has a clock speed that is 30% higher than the old one. If running the same binaries on both machines. What speedup is the new computer providing?

Example Problem A new computer has an IPC that is 20% worse than the old one. However, it has a clock speed that is 30% higher than the old one. If running the same binaries on both machines. What speedup is the new computer providing? OLD NEW IPC 1 0.8 Frequency 1 1.3 IC 1 1 CPI?? CT?? CPU Time??

Example Problem A new computer has an IPC that is 20% worse than the old one. However, it has a clock speed that is 30% higher than the old one. If running the same binaries on both machines. What speedup is the new computer providing? Speedup = 1/0.96 = 1.04 OLD NEW IPC 1 0.8 Frequency 1 1.3 IC 1 1 CPI 1/1 1/0.8 = 1.25 CT 1/1 1/1.3 ~ 0.77 CPU Time 1 ~0.96

Principles of Computer Design Designing better computer systems requires better utilization of resources Parallelism n Multiple units for executing partial or complete tasks Principle of locality (temporal and spatial) n Reuse data and functional units Common Case n Use additional resources to improve the common case

Amdahl s Law The law of diminishing returns

Example Problem Our new processor is 10x faster on computation than the original processor. Assuming that the original processor is busy with computation 40% of the time and is waiting for IO 60% of the time, what is the overall speedup?

Example Problem Our new processor is 10x faster on computation than the original processor. Assuming that the original processor is busy with computation 40% of the time and is waiting for IO 60% of the time, what is the overall speedup? f=0.4 s=10 Speedup = 1 / (0.6 + 0.4/10) = 1/0.64 = 1.5625

Power and Energy

Power and Energy Power = Voltage x Current (P = VI) Instantaneous rate of energy transfer (Watt) Energy = Power x Time (E = PT) The cost of performing a task (Joule)

Power and Energy Power = Voltage x Current (P = VI) Instantaneous rate of energy transfer (Watt) Energy = Power x Time (E = PT) The cost of performing a task (Joule) Peak Power = 3W Average Power = 1.66W Total Energy = 5J

CPU Power and Energy All consumed energy is converted to heat CPU power is the rate of heat generation Excessive peak power may result in burning the chip Static and dynamic energy components n Energy = (Power Static + Power Dynamic ) x Time n Power Static = Voltage x Current Static n Power Dynamic Capacitance x Voltage 2 x (Activity x Frequency)

Power Reduction Techniques Reducing capacitance (C) Reducing voltage (V) Reducing frequency (f).

Power Reduction Techniques Reducing capacitance (C) Requires changes to physical layout and technology Reducing voltage (V) Reducing frequency (f).

Power Reduction Techniques Reducing capacitance (C) Requires changes to physical layout and technology Reducing voltage (V) Negative effect on frequency Opportunistically power gating (wakeup time) Dynamic voltage and frequency scaling Reducing frequency (f).

Power Reduction Techniques Reducing capacitance (C) Requires changes to physical layout and technology Reducing voltage (V) Negative effect on frequency Opportunistically power gating (wakeup time) Dynamic voltage and frequency scaling Reducing frequency (f) Negative effect on CPU time Clock gating in unused resources.

Power Reduction Techniques Reducing capacitance (C) Requires changes to physical layout and technology Reducing voltage (V) Negative effect on frequency Opportunistically power gating (wakeup time) Dynamic voltage and frequency scaling Reducing frequency (f) Negative effect on CPU time Clock gating in unused resources Points to note Utilization directly effects dynamic power Lowering power does NOT mean lowering energy

Example Problem For a processor running at 100% utilization and consuming 60W, 30% of the power is attributed to leakage. What is the total power dissipation when the processor is running at 50% utilization?

Example Problem For a processor running at 100% utilization and consuming 60W, 30% of the power is attributed to leakage. What is the total power dissipation when the processor is running at 50% utilization? @100% Power = 18W + 42W = 60W @50% Power = 18W + 21W = 39W

Example Problem A processor consumes 80W of dynamic power and 20W of static power at 3GHz. It completes a program in 20 seconds. What is the energy consumption if frequency scales down by 20%?

Example Problem A processor consumes 80W of dynamic power and 20W of static power at 3GHz. It completes a program in 20 seconds. What is the energy consumption if frequency scales down by 20%? @3GHz Energy = (80W + 20W) x 20s = 2000J @2.4GHz Energy = (0.8x80W + 20W) x 20/0.8 = 2100J

Example Problem A processor consumes 80W of dynamic power and 20W of static power at 3GHz. It completes a program in 20 seconds. What is the energy consumption if frequency scales down by 20%? What is the energy consumption if voltage and frequency scale down by 20%?

Example Problem A processor consumes 80W of dynamic power and 20W of static power at 3GHz. It completes a program in 20 seconds. What is the energy consumption if frequency scales down by 20%? What is the energy consumption if voltage and frequency scale down by 20%? @ 80%V and 80%f Energy = (80x0.8 2 x0.8+20x0.8) x 20/0.8 = 1424J

Cost and Reliability

Cost of Integrated Circuit Cost of die!"#$% '()* +,$) -$%!"#$% +,$ /,$0+ Example wafer Yield of die!"#$% /,$0+ (23+$#$'* -$% 45,* "%$" +,$ "%$") 7 N: process-complexity factor Die n Specified by chip manufacturer

Example Problem Defect rate for a 144mm 2 die is 0.5 per cm 2. Assuming that we use a 40nm technology node (N=11) with 100% wafer yield, find the die yield.

Example Problem Defect rate for a 144mm 2 die is 0.5 per cm 2. Assuming that we use a 40nm technology node (N=11) with 100% wafer yield, find the die yield. Die yield = 1/(1 + 0.5x1.44) 11

Dependability A measure of system's reliability and availability System reliability n A measure of continuous service (time-to-failure) n Mean Time To Failure (MTTF) n Mean Time To Repair (MTTR) System availability

Dependability A measure of system's reliability and availability System reliability n A measure of continuous service (time-to-failure) n Mean Time To Failure (MTTF) = (3+2+1)/3 = 2 n Mean Time To Repair (MTTR) = (0.75+1+1.25)/3 = 1 System availability MTTF MTTF + MTTR = 2/(2+1) = 0.67