Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Similar documents
Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Computer Architecture!

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

CIT 668: System Architecture

Computer Architecture

Computer Systems Architecture Spring 2016

Course web site: teaching/courses/car. Piazza discussion forum:

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Advanced Computer Architecture

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

Multicore and Parallel Processing

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

45-year CPU Evolution: 1 Law -2 Equations

Lecture 1: Introduction

Performance of computer systems

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Computer Architecture. Lecture 6.1: Fundamentals of

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Computer Architecture!

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Outline Marquette University

Chapter 7. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 7 <1>

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Lec 25: Parallel Processors. Announcements

EITF20: Computer Architecture Part1.1.1: Introduction

Parallel Computing. Parallel Computing. Hwansoo Han

Lecture 1: Gentle Introduction to GPUs

Modern Computer Architecture (Processor Design) Prof. Dan Connors

Advanced Processor Architecture

The Processor: Instruction-Level Parallelism

Computer Architecture!

Microarchitecture Overview. Performance

Multi-Core Microprocessor Chips: Motivation & Challenges

Instructor Information

Chapter 1: Fundamentals of Quantitative Design and Analysis

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

Microarchitecture Overview. Performance

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Advanced processor designs

Computer Architecture EE 4720 Midterm Examination

Lecture 14: Multithreading

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE 571 Advanced Microprocessor-Based Design Lecture 4

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

COMPUTER ORGANIZATION AND DESI

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

COMP 633 Parallel Computing.

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Parallelism, Multicore, and Synchronization

CIT 668: System Architecture. Computer Systems Architecture

Computer Architecture s Changing Definition

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Bell s Law new class per decade

CS 152, Spring 2011 Section 2

Trends in the Infrastructure of Computing

Pipelining. CS701 High Performance Computing

CS/ECE 752: Advanced Computer Architecture 1. Lecture 1: What is Computer Architecture?

EE 4980 Modern Electronic Systems. Processor Advanced

CSE502: Computer Architecture CSE 502: Computer Architecture

Response Time and Throughput

High Performance Computing

LECTURE 3: THE PROCESSOR

Computer Architecture. Introduction. Lynn Choi Korea University

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Computer Architecture Spring 2016

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Parallelism via Multithreaded and Multicore CPUs. Bradley Dutton March 29, 2010 ELEC 6200

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011

Administration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA:

Multithreaded Processors. Department of Electrical Engineering Stanford University

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Processor (IV) - advanced ILP. Hwansoo Han

Computer Architecture

Supercomputing and Mass Market Desktops

Performance, Power, Die Yield. CS301 Prof Szajda

Advanced Computer Architecture

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Computer Organization

Computer Architecture

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

RISC, CISC, and ISA Variations

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Computer Architecture

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

CpE 442 Introduction to Computer Architecture. The Role of Performance

Transcription:

Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John

Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M 1M-100M 100M- 2-20MHz 20M- 1GHz 10 B 0.1-4GHz Instruction/Cycle < 0.1 0.1-0.9 0.9-2.0 1-100 MIPS/MFLOPS < 0.2 0.2-20 20-2,000 100-10,000

Hot Chips 2014 (August 2014)

AMD KAVERI HOT CHIPS 2014

AMD KAVERI

HOTCHIPS 2014

Hotchips 2014

Hotchips 2014 - NVIDIA

Power Density (W/cm 2 ) Power Density in Microprocessors 10000 Sun s Surface 1000 Rocket Nozzle 100 Nuclear Reactor 10 4004 1 8008 8080 8085 8086 286 386 Hot Plate 486 Pentium Processors Core 2 1970 1980 1990 2000 2010 Source: Intel

Why Performance Evaluation? For better Processor Designs For better Code on Existing Designs For better Compilers For better OS and Runtimes Design Analysis

Lord Kelvin To measure is to know. "If you can not measure it, you can not improve it. "I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be." [PLA, vol. 1, "Electrical Units of Measurement", 1883-05-03]

Designs evolve based on Analysis Good designs are impossible without good analysis Workload Analysis Processor Analysis Design Analysis

Performance Evaluation - an integral part of good computer architecture Graphic in Patterson & Hennessy s first edition of the Computer Organization book Five Classic Components of a Computer

Metrics Latency: time to completely execute a certain task Throughput: amount of work that can be done over a period of time Power: instantaneous power during execution of a program Energy: Total energy consumption during the execution of the whole program Reliability: Failure rate CPI, IPC, MIPS, MFLOPS, MTTF, MTBF, AVF, Transactions/minute, Transactions/hour, MIPS/watt, Watts, Joules, Joules/instr, etc

Iron Law of Processor Performance Processor Performance = Execution Time Instructions Cycles = Program X Instruction X Time Cycle (code size) (CPI) (cycle time) CPI is often used for single-core processors when code size is same and cycle time is same between cases being compared.

Challenges in Performance Evaluation Complexity of Processors Complexity of Modern Workloads

Performance Evaluation of Early Non-pipelined Processors Simple non-pipelined processors/microcontrollers Attached is a datasheet from Motorola 68HC11 Non-overlapped operations Fixed number of cycles Add up the cycles according to the addressing mode of the instruction

PC instruction memory registers Data memory Early Pipelined Processors rd rs rt ALU +4 imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Write Back Use datapath figure to represent pipeline IFtch Dcd Exec Mem WB ALU I$ Reg D$ Reg

Pipelined Execution Representation Time IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB Evaluate by creating a simulator that mimics this process. Dealing of instruction dependencies and data forwarding etc. modeled in the simulator.

Processor Challenges Superscalar Processors Simultaneously Multithreaded Processors (SMT) (Also called Hyperthreading) Multicore Processors Each core can be Single-threaded Each core can be Hyperthreaded

Superscalar Processors

Multicore Processors Efficient utilization of big transistor budgets Wide superscalars are power hungry Have several cores albeit simple Operate at a lower energy point Run in parallel to recoup lost performance

Single ISA Heterogeneous Cores with same ISA, but with different microarchitectures Multiple ISA Heterogeneous One or more ISAs and Accelerators (main ISA, DSP processor ISA, Inter-Program Diversity hardware accelerators) GPGPUs Heterogeneous Architectures Static Single-ISA Heterogeneous Multi-core D A

Workload Challenges Virtualized Workloads Multiple non-parallelizable applications may be running on multiple cores Parallelizable Applications Operating Systems and Runtimes Dynamic Mapping, Scheduling Compiler optimizations

Complex Workloads - Heterogeneous Architectures Multiprogrammed workloads: e.g. SPEC CPU Static Single-ISA Heterogeneous Multi-core Multithreaded workloads: e.g. PARSEC Inter-Program Diversity Intra-Program Diversity Diversity inside programs D A

Iron Law of Processor Performance Processor Performance = Execution Time Instructions Cycles = Program X Instruction X Time Cycle (code size) (CPI) (cycle time) CPI is often used for single-core processors when code size is same and cycle time is same between cases being compared.

Simulation Methods

Classification of Techniques Performance Modeling Simulation Trace-Driven Simulation Execution Driven Simulation Complete System Simulation Event-Driven Simulation Statistical Simulation Analytical Modeling Probabilistic Models Queuing Models Markov Models PetriNet Models Performance Measurement On-Chip Hardware Monitoring Off-Chip Hardware Monitoring Software Monitoring Microcoded Instrumentation

PRESILICON EVALUATION Required in early design stages Before prototypes can be built Pre-silicon Very important because many design decisions are made based on this Timeliness of products are important in today s competitive world

POST-SILICON EVALUATION To improve current generation compilers To improve current generation operating systems and runtimes To improve current generation hardware To improve next generation of products

Evaluation of Modern and Future Huge Challenge Processors Evaluating one processor is hard enough Evaluating all the software and hardware layers involved The design process, the tradeoff evaluation, depends largely on the performance evaluation. Your company s future depends on the performance (P, P, E) estimates you project for potential designs.