COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 1. Computer Abstractions and Technology

Similar documents
The Computer Revolution. Classes of Computers. Chapter 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Chapter 1. The Computer Revolution

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

Response Time and Throughput

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

COMPUTER ARCHITECTURE AND OPERATING SYSTEMS (CS31702)

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Chapter 1. and Technology

TDT4255 Computer Design. Lecture 1. Magnus Jahre

The Computer Revolution. Chapter 1. The Processor Market. Classes of Computers. Morgan Kaufmann Publishers August 28, 2013

COMPUTER ORGANIZATION AND DESIGN

Performance, Power, Die Yield. CS301 Prof Szajda

Rechnerstrukturen

Syllabus. Chapter 1. Course Goals. Course Information. Course Goals. Course Information

Chapter 1. Computer Abstractions and Technology

CSE2021 Computer Organization. Computer Abstractions and Technology

Chapter 1. Computer Abstractions and Technology

CSE2021 Computer Organization

CSE2021 Computer Organization

Outline Marquette University

Computer Organization & Assembly Language Programming (CSE 2312)

EIE/ENE 334 Microprocessors

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

CS3350B Computer Architecture CPU Performance and Profiling

Computer Architecture. Fall Dongkun Shin, SKKU

COMPUTER ORGANIZATION AND. Editio. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Computer Organization and Design: The Hardware/Software Interface by Patterson and Hennessy

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Thomas Polzer Institut für Technische Informatik

CS3350B Computer Architecture. Introduction

An Introduction to Parallel Architectures

Chapter 1. Computer Abstractions and Technology. Jiang Jiang

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

CHAPTER 1 Computer Abstractions and Technology

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

Transistors and Wires

Fundamentals of Quantitative Design and Analysis

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CSCI 402: Computer Architectures. Computer Abstractions and Technology. Recall. What is computer architecture

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

FUNDAMENTOS DEL DISEÑO DE

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

LECTURE 1. Introduction

ECE331: Hardware Organization and Design

ECE 154A. Architecture. Dmitri Strukov

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

TDT 4260 lecture 2 spring semester 2015

Copyright 2012, Elsevier Inc. All rights reserved.

EECS4201 Computer Architecture

Performance of computer systems

Lecture 2: Performance

Computer Organization and Design

ECE 486/586. Computer Architecture. Lecture # 2

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Course web site: teaching/courses/car. Piazza discussion forum:

Advanced Computer Architecture (CS620)

CS Computer Architecture Spring Lecture 01: Introduction

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Concepts Introduced. Classes of Computers. Classes of Computers (cont.) Great Architecture Ideas. personal computers (PCs)

LECTURE 1. Introduction

ECE 486/586. Computer Architecture. Lecture # 3

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Lecture 1: Introduction

ECE369: Fundamentals of Computer Architecture

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

1.13 Historical Perspectives and References

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Computer Architecture s Changing Definition

Designing for Performance. Patrick Happ Raul Feitosa

Computer Architecture Review. Jo, Heeseung

Lecture 1: CS/ECE 3810 Introduction

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

Computer Architecture. What is it?

1.6 Computer Performance

Review: latency vs. throughput

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

Computer Architecture

CpE 442 Introduction to Computer Architecture. The Role of Performance

CPE300: Digital System Architecture and Design

ELE 455/555 Computer System Engineering. Section 1 Review and Foundations Class 5 Computer System Performance

CS 110 Computer Architecture

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

Introduction. EE380, Fall Hank Dietz.

Introduction. EE380, Spring Hank Dietz.

The Role of Performance

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Transcription:

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology

The Computer Revolution Progress in computer technology Underpinned by Moore s Law 1.1 Introduction Makes novel applications feasible Computers in automobiles Cell phones Human genome project World Wide Web Search Engines Computers are pervasive Chapter 1 Computer Abstractions and Technology 2

Classes of Computers Personal computers Delivery of good performance to single users at low cost, a general purpose computer Subject to cost/performance tradeoff Servers Network based and much larger computers for high capacity, performance, and reliability Widest range in cost and capability Low-end servers for file storage vs. High-end servers (or supercomputers) for scientific and engineering calculations Embedded computers Hidden as components of a system Stringent energy/performance/cost constraints and lower tolerance for failure Chapter 1 Computer Abstractions and Technology 3

The PostPC Era Battery operated personal mobile device (PMD) Million Chapter 1 Computer Abstractions and Technology 4

The PostPC Era Personal Mobile Device (PMD) Battery operated Connects to the Internet Hundreds of US dollars Smart phones, tablets, electronic glasses Cloud computing Warehouse Scale Computers (WSC) Amazon, google,. Software as a Service (SaaS) Portion of software run on a PMD and a portion run in the Cloud Chapter 1 Computer Abstractions and Technology 5

What You Will Learn You can answer the following questions: How high-level C/Java programs are translated into the machine language And how the hardware executes them The hardware/software interface What determines the performance of program And how it can be improved How hardware designers improve performance What is parallel processing Chapter 1 Computer Abstractions and Technology 6

Understanding Performance Algorithm Determine number of operations executed Programming language, compiler, architecture Determine number of machine instructions executed per algorithm/operation Processor and memory system Determine how fast instructions are executed I/O system (including OS) Determine how fast I/O operations are executed Chapter 1 Computer Abstractions and Technology 7

Eight Great Ideas in CA Design for Moore s Law (1965) Use abstraction to simplify design Make the common case fast Performance via parallelism Performance via pipelining Performance via prediction Hierarchy of memories 1.2 Eight Great Ideas in Computer Architecture Dependability via redundancy Chapter 1 Computer Abstractions and Technology 8

Hierarchy Layers in Programs Application software Written in high-level language System software Compiler: 1.3 Below Your Program Translate HLL code to machine code Operating System: service code Hardware Handling basic input/output Managing memory and storage Scheduling tasks & sharing resources Processor, memory, I/O controllers Chapter 1 Computer Abstractions and Technology 9

Levels of Program Code High-level language Level of abstraction closer to problem domain Provide for productivity and portability Assembly language Textual representation of instructions Machine language Encoded instructions and data in binary digits (bits) Chapter 1 Computer Abstractions and Technology 10

Components of a Computer The BIG Picture Same components for all kinds of computer Inputting/outputting data, processing data, and storing data Input/output includes User-interface devices Display, keyboard, mouse Storage devices Hard disk, CD/DVD, flash Network adapters For communicating with other computers 1.4 Under the Covers Chapter 1 Computer Abstractions and Technology 11

Through the Looking Glass LCD screen: picture elements (pixels) Mirrors content of frame buffer memory Chapter 1 Computer Abstractions and Technology 12

Touchscreen PostPC device Supersedes keyboard and mouse Resistive and Capacitive types Most tablets, smart phones use capacitive Capacitive allows multiple touches simultaneously Chapter 1 Computer Abstractions and Technology 13

Opening the Box Apple ipad 2 Capacitive multitouch LCD screen 3.8 V, 25 Watt-hour battery Computer board Chapter 1 Computer Abstractions and Technology 14

Inside the Processor Apple A5 Chapter 1 Computer Abstractions and Technology 15

Inside the Processor (CPU) Datapath: performs arithmetic operations on data Control: tells datapath, memory, and I/O devices what to do according to the wishes of the instruction Memory: data memory and instruction memory DRAM (dynamic random access memory) Cache memory Small fast SRAM memory for immediate access to data Chapter 1 Computer Abstractions and Technology 16

Abstractions The BIG Picture Abstraction helps us deal with complexity Hide lower-level detail Instruction set architecture (ISA) The hardware/software interface Application binary interface The ISA plus system software interface Implementation The details underlying and interface Chapter 1 Computer Abstractions and Technology 17

A Safe Place for Data Volatile main memory Loses instructions and data when power off Non-volatile secondary memory Magnetic disk Flash memory Optical disk (CDROM, DVD) Chapter 1 Computer Abstractions and Technology 18

Networks (Communicating with Others) Communication, resource sharing, nonlocal access Local area network (LAN): Ethernet Wide area network (WAN): the Internet Wireless network: WiFi, Bluetooth Chapter 1 Computer Abstractions and Technology 19

Technology Trends Electronics technology continues to evolve Increased capacity and performance Reduced cost DRAM capacity Year Technology Relative performance/cost 1951 Vacuum tube 1 1965 Transistor 35 1975 Integrated circuit (IC) 900 1995 Very large scale IC (VLSI) 2,400,000 2013 Ultra large scale IC 250,000,000,000 1.5 Technologies for Building Processors and Memory Chapter 1 Computer Abstractions and Technology 20

Chip Manufacturing Process Yield: proportion of working dies per wafer Chapter 1 Computer Abstractions and Technology 21

Intel Core i7 Wafer 300mm wafer, 280 chips, 32nm technology Each chip is 20.7 x 10.5 mm 2 Chapter 1 Computer Abstractions and Technology 22

Cost of a Chip Includes... Die cost affected by wafer cost, number of dies per wafer, and die yield (#good dies/#total dies) goes roughly with the cube of the die area Testing cost Packaging cost depends on pins, heat dissipation,... Chapter 1 Computer Abstractions and Technology 23

Cost of an IC A wafer is tested and chopped into dies C die Cwafer Dieper wafer Die yield Die per wafer (Wafer diameter/2) Die area 2 Wafer diameter 2 Die area The die is still tested and packaged into IC C IC C die C testing die C Final test yield packaging and final test Chapter 1 Computer Abstractions and Technology 24

Three Equations for IC Cost Cost per die Cost per wafer Dies per wafer Yield Exactly derived eq. Dies per wafer Wafer area Die area Approximation eq. Yield (1 (Defects per 1 areadie area/2)) 2 Statistical eq. Nonlinear relation to area and defect rate Wafer cost and area are fixed Defect rate determined by manufacturing process Die area determined by architecture and circuit design Chapter 1 Computer Abstractions and Technology 25

Defining Performance Which airplane has the best performance? 1.6 Performance Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 100 200 300 400 500 Passenger Capacity 0 2000 4000 6000 8000 10000 Cruising Range (miles) Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 500 1000 1500 Cruising Speed (mph) 0 100000 200000 300000 400000 Passengers x mph Chapter 1 Computer Abstractions and Technology 26

Response Time and Throughput Response time (aka execution time) How long it takes to do a task Throughput (aka bandwidth) Total work done per unit time PMDs are more focused on response time, while servers are more focused on throughput. Example: How are response time and throughput affected by Replacing the processor with a faster version? Adding more processors? We ll focus on response time for now Chapter 1 Computer Abstractions and Technology 27

Relative Performance Define Performance = 1/Execution Time X is n time faster than Y Performance X Execution time Performance Y Y Execution time X n Example: time taken to run a program 10s on A, 15s on B Execution Time B / Execution Time A = 15s / 10s = 1.5 So A is 1.5 times faster than B Chapter 1 Computer Abstractions and Technology 28

Measuring Execution Time Elapsed time Total response time, including all aspects Processing, I/O, OS overhead, idle time, Determines system performance CPU time Time spent by CPU for processing a given job Discounts I/O time, other jobs shares Comprises user CPU time and system CPU time User CPU time: the CPU time spent in a program itself System CPU time: the CPU time spent in OS performing tasks on behalf of the program Chapter 1 Computer Abstractions and Technology 29

Execution Time and CPU Clocking Operation of digital hardware governed by a constant-rate clock Clock (cycles) Data transfer and computation Update state Clock period Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250 10 12 s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0 10 9 Hz Chapter 1 Computer Abstractions and Technology 30

CPU Time CPU Time CPU Clock Cycles Clock Cycle Time CPU Clock Cycles Clock Rate Performance improved by Reducing number of clock cycles (or cycle count) Increasing clock rate Hardware designer must often trade off clock rate against cycle count Chapter 1 Computer Abstractions and Technology 31

CPU Time Example Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 clock cycles How fast must Computer B clock be? Clock Rate B Clock Cycles CPU Time B B 1.2Clock Cycles 6s A Clock Cycles A CPU Time A Clock Rate A 10s2GHz 2010 9 Clock Rate B 1.22010 6s 9 2410 6s 9 4GHz Chapter 1 Computer Abstractions and Technology 32

Instruction Count and CPI Clock Cycles Instruction Count Cycles per Instruction CPU Time Instruction Count CPI Clock Cycle Time Instruction Count CPI Clock Rate Instruction Count for a program Determined by program, ISA, and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI Average CPI affected by instruction mix Chapter 1 Computer Abstractions and Technology 33

CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? CPU Time A CPU Time B CPU Time B CPU Time A Instruction Count CPI A I 2.0 250ps I500ps Instruction Count CPI B I1.2500ps I 600ps I 600ps I500ps 1.2 Cycle Time A Cycle Time B A is faster by this much Chapter 1 Computer Abstractions and Technology 34

CPI in More Detail If different instruction classes take different numbers of cycles Average CPI affected by instruction mix Clock Cycles n i1 (CPIi Instruction Counti) Weighted average CPI CPI Clock Cycles Instruction Count n i1 CPI i Instruction Counti Instruction Count Relative frequency Chapter 1 Computer Abstractions and Technology 35

CPI Example Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 1: IC = 5 Clock Cycles = 2 1 + 1 2 + 2 3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4 1 + 1 2 + 1 3 = 9 Avg. CPI = 9/6 = 1.5 Chapter 1 Computer Abstractions and Technology 36

Performance Summary The BIG Picture CPU Time Instructions Program Clock cycles Instruction Seconds Clock cycle Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, T c Chapter 1 Computer Abstractions and Technology 37

Clock Rate and Power Trends 1.7 The Power Wall In CMOS IC technology Power Capacitive load Voltage 2 Frequency 30 5V 1V 1000 Chapter 1 Computer Abstractions and Technology 38

Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction 2 Pnew Cold 0.85(Vold 0.85) Fold 0.85 4 0.85 2 P C V F old The power wall old We can t reduce voltage further We can t remove more heat old old 0.52 How else can we improve performance? Chapter 1 Computer Abstractions and Technology 39

Uniprocessor Performance 1.8 The Sea Change: The Switch to Multiprocessors Constrained by power, instruction-level parallelism, memory latency Chapter 1 Computer Abstractions and Technology 40

Multiprocessors Multicore microprocessors More than one processor per chip Requires explicitly parallel programming Compare with instruction level parallelism Hardware executes multiple instructions at once Hidden from the programmer Hard to do Programming for performance Load balancing Optimizing communication and synchronization Chapter 1 Computer Abstractions and Technology 41

SPEC CPU Benchmark Benchmark: Programs used to measure performance Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC) Develops benchmarks for CPU, I/O, Web, SPEC CPU2006 Elapsed time to execute a selection of programs Negligible I/O, so focuses on CPU performance Normalize relative to reference machine Summarize as geometric mean of performance ratios CINT2006 (integer) and CFP2006 (floating-point) n n Execution time ratio i i1 Chapter 1 Computer Abstractions and Technology 42

Remark e.g. 1.25 SPECRatio SPECRatio A B ExecutionTime ExecutionTime B A ExecutionTime ExecutionTime ExecutionTime ExecutionTime Performance Performance reference reference A B A B SPECRatio is just a ratio rather than an absolute execution time Note that when comparing 2 computers as a ratio, execution times on the reference computer drop out, so choice of reference computer is irrelevant Chapter 1 Computer Abstractions and Technology 43

CINT2006 for Intel Core i7 920 Chapter 1 Computer Abstractions and Technology 44

SPEC Power Benchmark Power consumption of server at different workload levels Performance: ssj_ops Power: Watts (Joules/sec) Overall ssj_ops per Watt 10 i0 10 ssj_ops i power i0 i server side Java operations per second per watt Chapter 1 Computer Abstractions and Technology 45

SPECpower_ssj2008 for Xeon X5650 Chapter 1 Computer Abstractions and Technology 46

有關效能的另一個公式 0.5 小時 從台北到高雄要多久? 4 小時 0.5 小時 如果改坐飛機, 台北到高雄只要 1 小時全程可以加快多少?

由台北到高雄 不能 enhance 的部份為在市區的時間 : 0.5 + 0.5 = 1 小時 可以 enhance 的部份為在高速公路上的 4 小時 => 佔總時間的 4/(4+1) = 0.8 = F 現在改用飛機, 可以 enhance 的部份縮短為 1 小時 => S = 4/1 = 4 走高速公路所需時間 4 + 1 speedup = ----------------------- = ---------- = 2.5 坐飛機所需時間 1 + 1 另一種算法 (Amdahl s Law): 1 1 speedup = ------------------------ = ------------------------- ((1-0.8) + 0.8/4) (1 0.8) + 0.8/4 When S ->, speedup -> 5 Chapter 1 Computer Abstractions and Technology 48

Amdahl's Law Speedup due to enhancement E: Speedup(E) Execution Time without E Execution Time with E Performance w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, Execution Time(w/ E) ((1F) Speedup(w/ E) F ) S Execution Time(w/o E) 1 (1- F) F S S 1 1 F Chapter 1 Computer Abstractions and Technology 49

Pitfall: Amdahl s Law Improving an aspect of a computer and expecting a proportional improvement in overall performance T improved Taffected improvement factor T unaffected Example: multiply accounts for 80s/100s How much improvement in multiply performance to get 5 overall? 80 20 20 Can t be done! n 1.10 Fallacies and Pitfalls Corollary: make the common case fast Chapter 1 Computer Abstractions and Technology 50

Fallacy: Low Power at Idle Look back at i7 power benchmark At 100% load: 258W At 50% load: 170W (66%) At 10% load: 121W (47%) Google data center Mostly operates at 10% 50% load At 100% load less than 1% of the time Consider designing processors to make power proportional to load Chapter 1 Computer Abstractions and Technology 51

Pitfall: MIPS as a Performance Metric MIPS: Millions of Instructions Per Second MIPS Instruction count Execution time 10 Instruction count Instruction count CPI 10 Clock rate Doesn t account for Differences in ISAs between computers Differences in complexity between instructions 6 6 Clock rate 6 CPI10 CPI varies between programs on a given CPU Chapter 1 Computer Abstractions and Technology 52

Concluding Remarks Cost/performance is improving Due to underlying technology development Hierarchical layers of abstraction In both hardware and software Instruction set architecture The hardware/software interface Execution time: the best performance measure Power is a limiting factor Use parallelism to improve performance 1.9 Concluding Remarks Chapter 1 Computer Abstractions and Technology 53