Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

Similar documents
Fundamentals of Quantitative Design and Analysis

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

EECS4201 Computer Architecture

Transistors and Wires

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

ECE 486/586. Computer Architecture. Lecture # 2

Lecture 1: Introduction

Performance, Power, Die Yield. CS301 Prof Szajda

CSE 502 Graduate Computer Architecture

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Instructor Information

Performance of computer systems

DEPARTMENT OF ECE IV YEAR ECE EC6009 ADVANCED COMPUTER ARCHITECTURE LECTURE NOTES

Introduction to Computer Architecture II

CS/EE 6810: Computer Architecture

Advanced Computer Architecture (CS620)

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Course web site: teaching/courses/car. Piazza discussion forum:

Response Time and Throughput

The Computer Revolution. Classes of Computers. Chapter 1

1.13 Historical Perspectives and References

Computer Architecture

Chapter 1: Fundamentals of Quantitative Design and Analysis

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

LECTURE 1. Introduction

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Review: latency vs. throughput

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Computer Architecture. What is it?

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Lecture 2: Performance

Chapter 1. The Computer Revolution

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

04S1 COMP3211/9211 Computer Architecture Tutorial 1 (Weeks 02 & 03) Solutions

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture

Fundamentals of Computer Design

The Role of Performance

Fundamentals of Computers Design

TDT 4260 lecture 2 spring semester 2015

Exercise 1 Due 02.November 2010, 12:15pm

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Quiz for Chapter 1 Computer Abstractions and Technology

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

ELE 455/555 Computer System Engineering. Section 1 Review and Foundations Class 5 Computer System Performance

TDT4255 Computer Design. Lecture 1. Magnus Jahre

Twos Complement Signed Numbers. IT 3123 Hardware and Software Concepts. Reminder: Moore s Law. The Need for Speed. Parallelism.

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

EITF20: Computer Architecture Part1.1.1: Introduction

LECTURE 1. Introduction

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Fundamentals of Computer Design

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

CS 654 Computer Architecture Summary. Peter Kemper

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

Performance. February 12, Howard Huang 1

Computer Organization & Assembly Language Programming (CSE 2312)

Advanced Computer Architecture Week 1: Introduction. ECE 154B Dmitri Strukov

Multicore and Parallel Processing

Processor Design. Introduction, part I

Lec 25: Parallel Processors. Announcements

Computer Architecture. Chapter 1 Part 2 Performance Measures

Advanced Computer Architecture Week 1: Introduction. ECE 154B Dmitri Strukov

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Fundamentals of Computer Design

CS3350B Computer Architecture CPU Performance and Profiling

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

Course overview Computer system structure and operation

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Computer Architecture s Changing Definition

Lecture 1: CS/ECE 3810 Introduction

Tutorial 11. Final Exam Review

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Engineering 9859 CoE Fundamentals Computer Architecture

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 2: Figures of Merit and Evaluation Methodologies

Adapted from David Patterson s slides on graduate computer architecture

ECE 154A. Architecture. Dmitri Strukov

Advanced Topics in Computer Architecture

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

COMPUTER ARCHITECTURE AND OPERATING SYSTEMS (CS31702)

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

Outline Marquette University

CMSC 611: Advanced Computer Architecture

CS Computer Architecture Spring Lecture 01: Introduction

Transcription:

Computer Architecture Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

What is Architecture? Instruction Set Design. Old definition from way back when computers had bitslices and microcode. Determine the high level attributes that optimize the performance, cost, power consumption, etc, of a computer. Interact with compiler developers, other software and hardware developers and manufacturers.

Instruction Set Architecture Class of ISA Register-Memory Load-Store Memory Addressing Byte addressing Aligned, non-aligned Addressing modes Register Immediate Displacement

Instruction Set Architecture Types and sizes of operands Byte Short (also Unicode characters) Integers and long integers Floating point (single, double, extended double) Signed, unsigned Operators Data transfer Arithmetic Multimedia

Instruction Set Architecture Control Flow Jumps, branches, procedure calls PC-relative addressing Encoding the ISA Fixed length Variable length VLIW

What Else is Architecture? Organization Choice and interconnection of subsystems Multi-core and hyperthreaded designs Interaction with peripherals Hardware Digital Logic Design Underlying technology

Trends Transistor count goes up by 40-55% per year About 35% due to density increase 10-20% due to the increase of the die DRAM doubles every two years Disk 60-100% per year until 2004 30% since then Flash memory doubles every couple of years Network speed has tenfold jumps every few years

Bandwidth vs Latency Bandwidth: how long it takes to d/l a movie Latency: how long to read a character from the disk/network Both have to do CPU-memory communication Floating point unit Network Audio, recording-playback Disk Graphics

Bandwidth vs Latency In the past 20 years Bandwidth improved 1000-2000X Latency improved 20-40X Bandwidth improves with Paralelism Clock speed Latency improves with Clock speed

Transistors and Wires Transistor density improves quadratically with feature size Transistor speed improves linearly with feature size Wire delays do not change much with feature size Length really depends on die size R*C changes little and speed of light is constant VLSI chips have a lot of wire!

Trends in Power Power is one of the hottest issues Cost of running server farms Cost of running a home computer Battery life Size and weight of heat sink Negative publicity

Trends in Power Power depends on capacitance, voltage and frequency. P d = 1 2 C V 2 f

Trends in Power Voltage does not go down fast enough Went from 5V to 1V in twenty years Power consumption went from less than 1W to over 100W in the same period. Current leakage is another factor in power consumption.

Trends in Cost Learning Curve As time goes by yield increases Volume Economies of scale in purchasing and manufacturing Commodification Increased competition Expired patents

Cost of a Chip Cost of die, testing and packaging Divided by the final test yield Cost of a die is the cost of the wafer divided by the dies per wafer and the die yield. A chip can be rejected in multiple stages, increasing the cost Yield can affect the cost of a chip significantly.

Cost of a Chip The cost is proportional to its size (before testing and possibly rejection) The faults on the wafer are scattered randomly The probability to have a fault on the die increases with its size, so the yield drops The cost of a chip after testing increases much faster than its size (area)

Probabilistic Yield Yield (the probability of a chip being good) is always positive and less than one. Depends on the expected number of defects per chip and a parameter alpha which is a function of the complexity of the process. Y d = 1 N

Probabilistic Yield Every time the number of defects increases by one the yield goes down by 50-60% The moral of the story is that one should keep the die size small.

Fixed Costs Intellectual property Masks Can cost over a million. Fixed costs make small runs unprofitable FPGAs are popular for this reason mainly

Cost vs Price Apple vs Linux The price of a Linux based system is cost plus profit margin The price of an Apple system is cost plus profit plus brand premium Commodity items have no brand premium

Dependability Module reliability: Mean Time To Failure (MTTF) Failures In Time (FIT) Inverse of MTTF Usually quoted as failures per billion hours E.g. If MTTF is a million hours, FIT is 1000 failures per billion hours How is MTTF or FIT estimated?

Dependability Module reliability: Mean Time To Failure (MTTF) Failures In Time (FIT) Inverse of MTTF Usually quoted as failures per billion hours E.g. If MTTF is a million hours, FIT is 1000 failures per billion hours How is MTTF or FIT estimated? Wait a million hours to get MTTF? Wait a billion hours to get FIT? Ask a statistician

Dependability Estimate MTTF Put 10,000 disks in a room Run them for a 1000 hours (month and a half) Count the failures If you have 10 failures in 10 million disk-hours 1 failure every million hours of operation A million hours MTTF FIT of 1000

Dependability Estimate MTTF Put 10,000 disks in a room Run them for a 1000 hours (month and a half) Count the failures If you have 10 failures in 10 million disk-hours 1 failure every million hours of operation A million hours MTTF FIT of 1000 Hidden assumption? There are lies, damn lies and statistics Statistics is the art of deceiving less intelligent people.

Dependability The hidden (unstated) assumption is The probability distribution is exponential To be even more impressive: Disk failures are a Poisson process It means that the failure rate is independent of the age of the disks This is a strong assumption There are many failures in the first weeks Few failures in the next five years Failure rate increases after that

Dependability Mean Time To Repair (MTTR) Mean Time Between Failures (MTBF) MTBF = MTTF + MTTR Availability = MTTF/MTBF

Dependability Example 10 disks, million hours MTTF 1 power supply, 200,000 hours MTTF

Dependability Example 10 disks, million hours MTTF 1 power supply, 200,000 hours MTTF What is the overall MTTF? 1 disk has one millionth failures per hour 10 disks have 10 millionths failures per hour 1 PS has five millionths failures per hour System has 15 millionths failures per hour Overall MTTF is 66,666 hours (7.7 years)

Performance Performance is the inverse of execution time We care for both response time and throughput We care about actual computation time or tasks per unit time Various indicators mean nothing to professionals Forget MIPS, GHz, etc. Do not pay more for multimedia extensions (useless unless your movie editor knows about them)

Measuring Performance Performance is central to architecture Yet, hard to measure! We need a good definition The definition depends on your point of view. Consider this: A web server administrator sleeps peacefully knowing that 90% of the time his servers are lightly loaded Unfortunately, light load means few users, and few users means few witness his great system at its best

Measuring Performance How to deal with I/O It affects response time But if it overlaps with CPU, it does not affect throughput, and maybe not even response time

Measuring Performance We need to test our systems with representative software There is no ideal benchmark Even if one appears, it would soon defeated Optimize hardware/compiler for the benchmark

Measuring Performance We need to test our systems with representative software There is no ideal benchmark Even if one appears, it would soon defeated Optimize hardware/compiler for the benchmark Existing benchmarks are Chimerical Fake Toy Make believe Made up

Benchmarks Real applications Chimerical: portability, multitude of versions, too many of them Modified applications Honest attempt to fake reality Kernels OK to test particular features Toy benchmarks Good choice of name Synthetic benchmarks Their unrealistic nature makes them easy to defeat

Benchmark Suites SPEC is the de facto standard Includes many suites targeting desktops, servers, etc. Most programs in the suite are dropped or modified with every edition of the standard There is always discussion on how to weight the results

Quantitative Performance Despite the problems, numbers are the king Basic principle: Make the common case fast Amdahl's Law Takes into account two factors The fraction of the affected computation The enhancement of the affected computation Does not take into account many details

Pitfalls of Amdahl's Law Does not account for overlaps in computation If FP happened mostly during delays in memory If the improvement leads to changes in style of optimization (re-do the computation vs. save and retrieve) Remains a good guide if used properly

Amdahl's Law Gives the new execution time given the old, the fraction of the enhanced computation and the speedup of the enhanced computation. T new =T old 1 F enh F enh S enh

Amdahl's law Common sense Very frequently ignored. Renders many grandiose architectures futile.

Example We speed up the FP operations tenfold. In our benchmark FP accounts for 40% of execution time. S overall = 1 0.6 0.4 10 = 1 0.64 =1.56

Example A system has 5 disks with 1 million hours MTTF and a power supply with 100,000 hours MTTF. What is the improvement if we make the disks 10 times more reliable? The failure rate before the improvement was 15 millionths. We reduced it to 10.5 millionths Improvement 42%

Performance indices CPI (Clock (cycles) Per Instruction) IPC (Instructions Per Cycle) One inverse of the other IPC is more convenient for superscalar processors.

Performance Equation Instructions per program TIMES Clock cycles per instruction TIMES Seconds per Clock cycle EQUALS Seconds per program.

Fallacies and Pitfalls Amdahl's Law Do not include a golden link in a bronze chain The cost of the processor dominates the cost of the system For high end system is is about a quarter For economy systems is less than 10% Benchmarks remain valid No, vendors find ways around them

Fallacies and Pitfalls Amdahl's Law Do not include a golden link in a bronze chain The cost of the processor dominates the cost of the system For high end system is is about a quarter For economy systems is less than 10% Benchmarks remain valid No, vendors find ways around them Disks have 1 million hrs MTTF, about a century No, pdf is not exponential.

Fallacies and Pitfalls Peak performance is a good indicator Yes, sure Too much fault detection gives you trouble If you never use the FP unit, who cares if it is broken Balance the cost of checking against benefit of early detection Beware of screening tests