EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Similar documents
CS/ECE 752: Advanced Computer Architecture 1. Lecture 1: What is Computer Architecture?

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

Computer Architecture s Changing Definition

Advanced Computer Architecture (CS620)

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Lecture 3: Evaluating Computer Architectures. How to design something:

EECS4201 Computer Architecture

Copyright 2012, Elsevier Inc. All rights reserved.

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Computer Architecture

Fundamentals of Quantitative Design and Analysis

Course web site: teaching/courses/car. Piazza discussion forum:

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

CpE 442 Introduction to Computer Architecture. The Role of Performance

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Instructor Information

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Performance, Power, Die Yield. CS301 Prof Szajda

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

Lecture 1: Introduction

CS/EE 6810: Computer Architecture

Computer Architecture. R. Poss

Lecture 4: Instruction Set Architectures. Review: latency vs. throughput

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Computer Architecture. Introduction. Lynn Choi Korea University

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

Introduction to Computer Architecture II

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

Designing for Performance. Patrick Happ Raul Feitosa

Quiz for Chapter 1 Computer Abstractions and Technology

ECE 486/586. Computer Architecture. Lecture # 2

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

Microarchitecture Overview. Performance

Review: latency vs. throughput

Performance of computer systems

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

Microarchitecture Overview. Performance

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Performance Measurement (as seen by the customer)

1.3 Data processing; data storage; data movement; and control.

Lecture 1: CS/ECE 3810 Introduction

Computer Architecture. Fall Dongkun Shin, SKKU

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

Engineering 9859 CoE Fundamentals Computer Architecture

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

PERFORMANCE MEASUREMENT

High Performance Computing

LECTURE 1. Introduction

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Lecture 2: Performance

Final Lecture. A few minutes to wrap up and add some perspective

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

Transistors and Wires

ECE 588/688 Advanced Computer Architecture II

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

Computer Performance Evaluation: Cycles Per Instruction (CPI)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

EITF20: Computer Architecture Part1.1.1: Introduction

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Outline Marquette University

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Basics of Performance Engineering

ECE 468 Computer Architecture and Organization Lecture 1

General Purpose Signal Processors

GRE Architecture Session

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

CSE 502 Graduate Computer Architecture

CPS104 Computer Organization Lecture 1

CPE300: Digital System Architecture and Design

ECE 588/688 Advanced Computer Architecture II

ELE 455/555 Computer System Engineering. Section 1 Review and Foundations Class 5 Computer System Performance

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

Fundamentals of Computers Design

Introduction to Microprocessor

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

CS 101, Mock Computer Architecture

Superscalar Processors

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

EE 3170 Microcontroller Applications

ECE 15B COMPUTER ORGANIZATION

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

Advanced Computer Architecture

EE382 Processor Design. Class Objectives

CPE300: Digital System Architecture and Design

Parts A and B both refer to the C-code and 6-instruction processor equivalent assembly shown below:

Transcription:

EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer systems are organized and why they are organized that way. instruction-set architecture system-level organization microarchitecture Be conversant with measures of computer performance and methods for increasing performance metrics benchmarks performance methods pipelining, multiple issue, branch prediction 2

Logistics Lectures Tues & Thur :5-2:30 Lecturers Marc Tremblay, Andrew Wolfe, Partha Ranganathan TAs Daniel Wang Other Grading Final Exam 35% Midterm 25% Homework 4+2 40% Text Hennessy & Patterson Computer Architecture: A Quantitative Approach (2nd Edition) See Policy Sheet for details 3 EE282 Online URL: http://www.stanford.edu/class/ee282 e-mail: ee282@lists.stanford.edu ee282tas@ogun.stanford.edu 4

Wanted: A Few Good Graders Grade a portion of every problem set In exchange: requirement to take problem sets is waived full credit on all problem sets 5 What is Computer Architecture? Technology API ISA Link I/O Chan Interfaces IR Regs Machine Organization Applications Computer Architect Measurement & Evaluation 6

Technology Constraints Yearly improvement Semiconductor technology 60% more devices per chip (doubles every 8 months) 5% faster devices (doubles every 5 years) Slower wires Magnetic Disks 60% increase in density Circuit boards 5% increase in wire density Cables no change 992 995 998 200 64x more devices since 992 4x faster devices 7 Changing Technology leads to Changing Architecture 970s multi-chip CPUs semiconductor memory very expensive microcoded control complex instruction sets (good code density) 980s single-chip CPUs, on-chip RAM feasible simple, hard-wired control simple instruction sets small on-chip caches 990s lots of transistors complex control to exploit instruction-level parallelism 2000s even more transistors slow wires limited cooling capacity several CPUs per chip thread level parallelism more to come 8

Application Constraints Applications drive machine balance Numerical simulations (technical computing) floating-point performance main memory bandwidth Transaction processing I/Os per second Memory bandwidth/latency important integer CPU performance Media processing low-precision pixel arithmetic multiply-accumulate rates bit manipulation Embedded control I/O timing 9 System-Level Organization Design at the level of processors, memories, and interconnect. More important to application performance than CPU design Feeds and speeds constrained by IC pin count, module pin count, and signaling rates System balance for a particular application Driven by performance/cost goals available components (cost/perf) technology constraints P SW 800MHz Dual Issue 6 Bytes x 200MHz I/O Display Net Disk M M M M 0

Microarchitecture Register-transfer-level (RTL) design Implement instruction set Exploit capabilities of technology locality and concurrency Iterative process generate proposed architecture estimate cost measure performance Current emphasis is on overcoming sequential nature of programs deep pipelining multiple issue dynamic scheduling branch prediction/speculation Regs Instr. Cache IP B A IR C The Architecture Process Estimate Cost & Performance Sort New concepts created Bad ideas Mediocre ideas Good ideas 2

Performance Measurement and Evaluation Many dimensions to computer performance CPU execution time by instruction or sequence floating point integer branch performance Cache bandwidth Main memory bandwidth I/O performance bandwidth seeks pixels or polygons per second Relative importance depends on applications P $ M 3 Evaluation Tools Benchmarks, traces, & mixes macrobenchmarks & suites application execution time microbenchmarks measure one aspect of performance traces replay recorded accesses cache, branch, register Simulation at many levels ISA, cycle accurate, RTL, gate, circuit trade fidelity for simulation rate Area and delay estimation Analysis e.g., queuing theory MOVE 39% BR 20% LOAD 20% STORE 0% ALU % LD 5EA3 ST 3FF. LD EA2. 4

Benchmarks Microbenchmarks measure one performance dimension cache bandwidth main memory bandwidth procedure call overhead FP performance weighted combination of microbenchmark performance is a good predictor of application performance gives insight into the cause of performance bottlenecks Macrobenchmarks application execution time measures overall performance, but on just one application Applications Micro Perf. Dimensions Macro 5 Some Warnings about Benchmarks Benchmarks measure the whole system application compiler operating system architecture implementation Popular benchmarks typically reflect yesterday s programs computers need to be designed for tomorrow s programs Benchmark timings often very sensitive to alignment in cache location of data on disk values of data Benchmarks can lead to inbreeding or positive feedback if you make an operation fast (slow) it will be used more (less) often so you make it faster (slower) and it gets used even more (less) and so on 6

Means Arithmetic mean Harmonic mean n T i n i= n i= n R i Can be weighted. Represents total execution time Geometric mean n i= Ti T ri n = exp n n i= Ti log Tri Consistent independent of reference 7 Performance Basics Amdahl s Law speedup fraction p by S Performance equation I - instructions executed CPI - clock cycles per instruction t cy - clock cycle Need to consider all three elements when making changes But remember workloads change depending on machine characteristics p T = T 0 ( p) + S T I CPI t 8

Amdahl s Law Speedup vs Vector Fraction 00 90 p T = T 0 ( p) + S Speedup 80 70 60 50 40 30 20 0 0 0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Fraction of Code Vectorizable 9 Amdahl s law assumes serial execution T T 2 T 3 T = T + T2 + T3 T T 2 T 3n T T 2 T 3 T T M ( T T ) 20

Cost Chip cost is primarily a function of die area increases faster than linearly due to yield a very powerful processor can be built on a small die (0mm) going larger gives diminishing performance returns but W afer Cost 2,500.00 W afer Diameter 200 mm W afer Area 346 mm^2 Chip Size Die Area Die/W aferyield Good Die Cost/Die 3097 0.98 3035 $0.08 5 25 67 0.96 22 $2.23 7.5 56.25 499 0.8 406 $6.6 0 00 269 0.62 66 $5.06 5 225 0 0.35 38 $65.79 7.5 306.25 77 0.27 2 $9.05 2 Chip cost is a function of size $250.00 $200.00 Unpackaged Cost ($) $50.00 $00.00 $50.00 $0.00 0 2 4 6 8 0 2 4 6 8 20 Chip Size (mm) 22

CPU cost is a small fraction of system cost (today) $40 $85 $30 $40 $64 $35 DRAM (28MB) Monitor (7") Disk (0GB) Other Chips Power Supply Processor Mechanical Processor 30 6% DRAM (28MB) 64 4% Monitor (7") 35 29% Disk (0GB) 70 5% Other Chips 85 8% Power Supply 40 9% Mechanical 40 9% Total 434 94% $70 23