Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Similar documents
Course web site: teaching/courses/car. Piazza discussion forum:

Instructor Information

Computer Architecture

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

Performance, Power, Die Yield. CS301 Prof Szajda

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

The Role of Performance

Designing for Performance. Patrick Happ Raul Feitosa

CO Computer Architecture and Programming Languages CAPL. Lecture 15

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Final Lecture. A few minutes to wrap up and add some perspective

Response Time and Throughput

CpE 442 Introduction to Computer Architecture. The Role of Performance

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

CSE 141 Summer 2016 Homework 2

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Outline Marquette University

The Computer Revolution. Classes of Computers. Chapter 1

Reporting Performance Results

CS654 Advanced Computer Architecture. Lec 2 - Introduction

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

CS430 Computer Architecture

Computer Architecture Review. Jo, Heeseung

Review: latency vs. throughput

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Vector and Parallel Processors. Amdahl's Law

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Tutorial 11. Final Exam Review

Lecture 4: Instruction Set Architectures. Review: latency vs. throughput

T T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

ECE 486/586. Computer Architecture. Lecture # 3

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Engineering 9859 CoE Fundamentals Computer Architecture

CS3350B Computer Architecture CPU Performance and Profiling

CPE300: Digital System Architecture and Design

GRE Architecture Session

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

Computer Architecture. Chapter 1 Part 2 Performance Measures

Computer Architecture. What is it?

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

The Von Neumann Computer Model

Performance of computer systems

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Computer Architecture s Changing Definition

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

1.3 Data processing; data storage; data movement; and control.

Fundamentals of Quantitative Design and Analysis

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

Chapter 1. The Computer Revolution

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

Quiz for Chapter 1 Computer Abstractions and Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Performance Evaluation: Cycles Per Instruction (CPI)

EECS4201 Computer Architecture

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

CPU Pipelining Issues

Memory Hierarchy Y. K. Malaiya

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

Pipelining. CS701 High Performance Computing

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

ECE331: Hardware Organization and Design

COMPUTER ARCHITECTURE AND OPERATING SYSTEMS (CS31702)

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Reminder: tutorials start next week!

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

CS 110 Computer Architecture

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Lecture 2: Pipelining Basics. Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4)

Midterm #2 Solutions April 23, 1997

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

COMPUTER ORGANIZATION AND DESIGN

LECTURE 1. Introduction

Performance Measurement (as seen by the customer)

Transcription:

Chapter 1 Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1

Course Goals Introduce you to design principles, analysis techniques and design options in computer architecture Instruction set design Memory-hierarchy design Pipelining I/O The use of cost/performance as a basis for making decisions about computer architecture Computer architecture is exciting Get you to ask interesting questions about computer architecture 2

Background Assume you have taken: A basic computer organization course A logic design course Assembly language programming Assume that you know What an instruction set looks like How to program in C 3

The Parts of a Computer CPU Integer datapath Control unit FP datapath Registers Registers Registers I/O system Disks Graphics Display Network Memory system Cache TLB Main Memory 4

Why Study Computer Architecture? 5

Performance? What do we mean when we say computer A is faster than computer B Response time Time from start to completion of an event Execution time Latency Throughput Amount of work done per unit time Bandwidth 6

Performance Program execution time is the measure of performance (seconds/program) Definition of execution time Wall clock time, elapsed time as seen by user includes everything (disk, OS overhead, competition with other jobs). CPU time: time CPU computing on your program, excluding I/O wait time 1. User CPU time 2. System CPU time System Performance: elapsed time of an unloaded system CPU Performance: user CPU time 90.7s: user CPU time 12.9s: system CPU time 2:39: elapsed time 65%: CPU/elapsed 7

Evaluating Performance Use real programs CAD, text processing, business applications, scientific applications input, output, options May not know what programs users will run Kernels: Small key pieces (inner loops) of scientific programs where program spends most of its time e.g. Livermore loops, LINPACK Amenable to hand analysis Toy Benchmarks e.g. Quicksort, Puzzle Easy to type, predictable results, may use to check correctness of machine but not as performance benchmark. 8

Summarizing Performance Model a real job mix with a smaller set of representative programs Total execution time is the ultimate measure of performance Weight benchmarks according to time spent in a real job mix How do you summarize performance? A single-number performance summary for the programs expressed in units of time should be directly proportional to (weighted) execution time A single-number performance summary for the programs expressed as a rate should be inversely proportional to to (weighted) execution time 9

Summarizing Performance Given n programs, Average of execution time: arithmetic mean (1/n)*(Time_1 + Time_2 + + Time_n) If performance is expressed as a rate: the average that tracks execution time: harmonic mean n / (1/rate_1 + 1/rate_2 + +1/rate_n) where rate_j = f (1/Time_j) 10

Summarizing Performance Weighted arithmetic mean (Time_1*Weight_1 + Time_2*Weight_2 + + Time_n*Weight_n) where Weight_1 + Weight_2 + + Weight_n = 1, and Weight_j > 0 Weighted harmonic mean 1/(Weight_1/Rate_1 + Weight_2/Rate_2 + + Weight_n/Rate_n) For example, Rate_j is the MIPS rate of machine j. 11

Geometric Mean for Normalized Execution Time Normalize execution time of a program j to the execution time in a reference machine ==> Execution time ratio_j Geometric mean: n th root of {(Execution time ratio_1)* *(Execution time ratio_n)} 12

Make the Common Case Fast Very important, sort of obvious but often overlooked Common case is made slower to make a less common case faster The frequent case is often simpler and can be done faster (e.g. addition rarely overflows) Not following this principle can increase design time. Complex problems should be handled in software Hardware should provide fast primitives, not complete solutions 13

Amdahl s Law Performance gain from improvement of some portion of a computer Original Observation: Speedup from parallel processing is limited by the fraction that cannot be parallelized. In general, Speedup = ET without enh / ET with enh = ET_old / ET_new where enh => enhancement ET_new = ET_old * {(1-fraction_enh) + fraction_enh / Speedup_enh} Speedup = 1 / {(1-fraction_enh) + fraction_enh / Speedup_enh} 14

Application of Amdahl s Law Parallel application that is 90% parallel, what is the speedup for application on 10, 100 and 1000 processors. 10% I/O and initialization: s 90% parallel: p S_p = 1/(s + p/p) = 1/(0.1 + 0.9/P) S_10 = 1/(0.1 + 0.9/10) = 1/0.19 = 5.26 S_100 = 1/(0.1 + 0.9/100) = 1/0.109 = 9.1 S_1000 = 9.9 Diminishing returns in performance 15

Increasing Parallelism 9% I/O initialization: s 91% parallel: p S_p = 1/(s + p/p) = 1/(0.09 + 0.91/P) S_100 = 1/(0.09 + 0.91/100) = 1/0.099 = 10.09 16

Using Amdahl s Law Making cost performance trade-offs Application spends 50% time in CPU and 50% of time waiting for I/O Cost of CPU = 1/3, cost of I/O = 2/3 New CPU increases CPU performance 5 times and CPU cost 5 times Is using a new CPU a good idea from a cost/performance standpoint. Speedup = 1/ (0.5 + 0.5/5) = 1/0.6 = 1.67 Cost increase = 2/3*1 + 1/3*5 = 2.33 Spend resources proportionately to where time is spent How much should we increase CPU speed for equal speedup and cost increase? 17

CPU Performance CPU time = CPU clock cycles per program * Clock cycle time CPI = CPU clock cycles per program Instruction count CPU time = Instruction count*cpi*clock cycle time Instructions Cycles Seconds Seconds * * = = CPU time Program Instruction Clock cycle Program Compiler ISA Organization Technology 18

CPU Performance Another way of looking at CPU time CPU time = (CPI_1*I_1 + CPI_2*I_2 + + CPI_n*I_n)*Clock cycle time CPI is now CPI = CPI_1*(I_1/Instruction count) + CPI_2*(I_2/Instruction count) + + CPI_n*(I_n/Instruction count) Frequency of I_j = I_j/Instruction count 19

CPU Performance Example -- Instruction frequencies for a load/store machine Instruction type Frequency Cycles Loads 25% 2 Stores 15% 2 Branches 20% 2 ALU 40% 1 -- All conditional branches in this machine use simple tests of equality with zero BEQZ, BNEZ -- Consider adding complex comparisons to conditional branches -- 25% of branches can use complex scheme--> no need preceding ALU instruction -- The CPU cycle time of original machine is 10% faster -- Will this increase CPU performance? 20

CPU Performance Example Old CPU performance CPI old =0.25 * 2 + 0.15 * 2 + 0.2 * 2 + 0.4 * 1 =1.6 CPU time old = 1.6*IC old *CCT old New CPU Performance CPI = =1.63 IC new CCT new new 0.25*2 + 0.15*2 + 0.2*2 + (0.4-0.25*0.2) *1 1-0.25 * 0.2 = 0.95 * IC =1.1 * CCT old old CPU time new = 1.63*(0.95*IC old )*(1.1*CCT old ) = 1.71*IC *CCT old old 21

Locality of Reference Fundamental observation about programs Possible to predict with high accuracy what a program will reference next based on what it has referenced in the recent past Temporal locality: recently referenced locations are likely to be referenced again (e.g. code loops, stack accesses). Spatial locality: nearby locations reference together (e.g. array access, code access) Memory hierarchy used to exploit locality 22

Memory Hierarchy Basic principle of hardware design: smaller is faster Small memories have less signal propagation delay and decoding Small memories can use more power per cell for speed Small fast memories cost more Use memory hierarchy to cost/performance of computer CPU registers Cache Main memory Disk memory denser, cheaper CPU Registers Cache Memory Disk smaller, faster 23

MIPS An indirect measure of performance Instruction count Instr. Count clock rate million instr/sec = 6 = 6 = 6 Execution time*10 #cycles*sec/cycle*10 CPI*10 Execution time= instruction count / MIPS/ 10^6 Problems with MIPS Difficult to compare different ISA No indication of program or program input MIPS can vary inversely to performance (?) Native MIPS 24

Megaflops MFLOPS = # of floating point operations in program Execution time * 1,000,000 Does not measure integer performance Assumes that same number and type of operations are executed on all machines Changes with mixture of fast and slow operations (type of float pt. operations) A function of instruction mix (% of float pt. operations) 25

Wafer 26

Die 27

Die Floorplan 28