Performance of computer systems

Similar documents
Performance, Power, Die Yield. CS301 Prof Szajda

Copyright 2012, Elsevier Inc. All rights reserved.

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Transistors and Wires

Fundamentals of Quantitative Design and Analysis

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Course web site: teaching/courses/car. Piazza discussion forum:

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

EECS4201 Computer Architecture

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

ECE 486/586. Computer Architecture. Lecture # 3

Response Time and Throughput

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Bell s Law new class per decade

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Designing for Performance. Patrick Happ Raul Feitosa

Lecture 1: Introduction

Lecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining

LECTURE 1. Introduction

CMSC 611: Advanced Computer Architecture

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Advanced Computer Architecture (CS620)

Computer Architecture

TDT 4260 lecture 2 spring semester 2015

The Computer Revolution. Classes of Computers. Chapter 1

Quiz for Chapter 1 Computer Abstractions and Technology

Outline Marquette University

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Lecture 3: Evaluating Computer Architectures. How to design something:

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 2: Figures of Merit and Evaluation Methodologies

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

CS3350B Computer Architecture CPU Performance and Profiling

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

The Role of Performance

Chapter 1: Fundamentals of Quantitative Design and Analysis

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

LECTURE 1. Introduction

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

Lecture 2: Performance

Instructor Information

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

CS 110 Computer Architecture

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

The Von Neumann Computer Model

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Lec 25: Parallel Processors. Announcements

Multicore Hardware and Parallelism

Chapter 1. The Computer Revolution

45-year CPU Evolution: 1 Law -2 Equations

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Computer Performance Evaluation: Cycles Per Instruction (CPI)

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Parallelism and Concurrency. COS 326 David Walker Princeton University

Why Parallel Architecture

Parallelism in Hardware

1.3 Data processing; data storage; data movement; and control.

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Evaluating Computers: Bigger, better, faster, more?

Microarchitecture Overview. Performance

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

Parallelism. CS6787 Lecture 8 Fall 2017

Review: latency vs. throughput

ECE 486/586. Computer Architecture. Lecture # 2

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

Computer Architecture. R. Poss

Parallel Computing. Parallel Computing. Hwansoo Han

Computer Architecture. What is it?

Microarchitecture Overview. Performance

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

Reporting Performance Results

Advanced Computer Architecture Week 1: Introduction. ECE 154B Dmitri Strukov

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner. Topic 1: Introduction

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

PERFORMANCE MEASUREMENT

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS654 Advanced Computer Architecture. Lec 2 - Introduction

Fundamentals of Computer Design

CSE 502 Graduate Computer Architecture

Computer Architecture. Fall Dongkun Shin, SKKU

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Multi-Core Microprocessor Chips: Motivation & Challenges

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Constructing Virtual Architectures on Tiled Processors. David Wentzlaff Anant Agarwal MIT

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Processor Architecture

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Fundamentals of Computers Design

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Parallel Computer Architecture

Transcription:

Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type of processor (e.g., RISC vs. CISC) What type of memory hierarchy What types of I/O devices How many processors in the system Software O.S., compilers, database drivers etc CSE378 Performance. 1 What are some possible metrics? Traditional measures: Raw speed (peak performance = clock rate) Eecution time (or response time): time to eecute a program from beginning to end. Need benchmarks for integer dominated programs, scientific, graphical interfaces, multimedia tasks, desktop apps, utilities etc. Throughput (total amount of work in a given time) measures utilization of resources (good metric when many users: e.g., large data base queries, Web servers) Improving (decreasing) eecution time will improve (increase) throughput. Most of the time, improving throughput will decrease eecution time CSE378 Performance. 2 1

Recently: What are some possible metrics? Measures that concern power Watts = joules / second Energy per instruction = joules / instruction eecuted Why be concerned about power? Battery life in portable devices Heat dissipation issues Server rooms are most constrained by their cooling capacity Dense clusters can be constrained by the ability to route enough power into the installation and/or to the individual processors CSE378 Performance. 3 CPU Eecution Time CSE378 Performance. 4 2

Moore s Law Courtesy Intel Corp. CSE378 Performance. 5 Processor-Memory Performance Gap Memory latency decrease (10 over 8 years but densities have increased 100 over the same period) o 86 CPU speed (100 over 10 years) 1000 100 10 386o Pentium IV o Pentium III o Pentium Pro o Memory wall Pentium o Memory gap 1 89 91 93 95 97 99 01 CSE378 Performance. 6 3

Comparing Processors Isn t Straightforward Different architectures have different instruction sets Can t run the same set of (machine) instructions on both Even different models in the same architecture may have a complicated relationship Model A s multiply is 6 times faster than model B s Model A s add is 3 times faster than model B s Model A s memory system is 8 times faster than model B s But, we really want to compare performance across processors CSE378 Performance. 7 Comparing Performance The right measure is eecution time Take some C program, compile, link and run on both processors Measure the time it takes from start to end of the eecution Notice that this means we are evaluating the compilers as well as the processors Is that reasonable? If we re not careful, we might be measuring other things as well E.g., speed of IO devices CSE378 Performance. 8 4

Eecution time Metric Eecution time: inverse of performance Performance A = 1 / (Eecution_time A ) Processor A is faster than Processor B Eecution_time A < Eecution_time B Performance A > Performance B Relative performance (a computer is n times faster than another one) Performance A / Performance B =Eecution_time B / Eecution_time A CSE378 Performance. 9 Definition of CPU eecution time CPU eecution_time = (#cycles) * (time per cycle) (#cycles) depends on program, compiler, and input (time per cycle) is the inverse of clock rate Depends on the processor s implementation Clock rate measured in MHz or GHz CSE378 Performance. 10 5

Another form of the equation CPU eecution_time = (#insts eecuted) * (cycles / instruction) * (time/cycle) (cycles / instruction) is called CPI CPI depends on processor s implementation: CPI = 1 Single cycle CPI > 1 Some instructions require more than one cycle CPI < 1 Some form of parallel eecution CSE378 Performance. 11 How to Improve Performance? CPU eecution_time = (#insts eecuted) * (cycles / instruction) * (time/cycle) Reduce (#insts eecuted) : better compilers Reduce (time/cycle) : higher clock rates or better processor implementations Reduce CPI : more internal parallelism in processor implementation Pipelining, Superscalar, multi-threaded CSE378 Performance. 12 6

Benchmarks Benchmark: workload representative of what a system will be used for Industry benchmarks SPECint and SPECfp industry benchmarks updated every few years, Currently SPEC CPU2006 Linpack (Lapack), NASA kernel: scientific benchmarks TPC-A, TPC-B, TPC-C and TPC-D used for databases and data mining Other specialized benchmarks (Olden for list processing, Specweb, SPEC JVM98 etc ) Benchmarks for desktop applications, web applications are not as standard Beware! Compilers (command lines) are super optimized for the benchmarks CSE378 Performance. 13 CSE378 Performance. 14 7

How to summarize benchmark performance n programs in the benchmark suite. What is the relative performance overall? A number of alternatives: arithmetic mean of eecution times: (Σeec_time i ) / n harmonic mean of rates: n/ (Σ 1/rate i ) geometric mean of rates: (Π rate i ) 1/n CSE378 Performance. 15 Power CSE378 Performance. 16 8

http://www.intel.com/pressroom/kits/core2duo/pdf/epi-trends-final2.pdf CSE378 Performance. 17 CSE378 Performance. 18 9

http://www.sun.com/processors/whitepapers/ust1_pwrsav_v1.0.pdf CSE378 Performance. 19 Computer design: Make the common case fast Amdahl s law (speedup) Speedup = (performance with enhancement)/(performance base case) Or equivalently, Speedup = (eec.time base case)/(eec.time with enhancement) For eample, application to parallel processing s fraction of program that is sequential Speedup S is at most 1/s That is if 20% of your program is sequential the maimum speedup with an infinite number of processors is at most 5 CSE378 Performance. 20 10