CS61C : Machine Structures

Similar documents
CS61C : Machine Structures

CS61C : Machine Structures

CS61C : Machine Structures

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Computer System. Performance

CS61C : Machine Structures

Impact of Cache Coherence Protocols on the Processing of Network Traffic

1.6 Computer Performance

CS61C : Machine Structures

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras

CS61C : Machine Structures

CS61C : Machine Structures

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek.

CS61C Performance. Lecture 23. April 21, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

APPENDIX Summary of Benchmarks

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

CS61C : Machine Structures

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Computer Science 246. Computer Architecture

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Course Summary

UC Berkeley CS61C : Machine Structures

Outline. Lecture 40 Hardware Parallel Computing Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley.

ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance

Cache Optimization by Fully-Replacement Policy

CS61C Machine Structures. Lecture 1 Introduction. 8/27/2006 John Wawrzynek (Warzneck)

CS654 Advanced Computer Architecture. Lec 1 - Introduction

Last time. Lecture #29 Performance & Parallel Intro

CS 110 Computer Architecture

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

CS61C : Machine Structures

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

Computer Architecture. R. Poss

Quantifying Performance EEC 170 Fall 2005 Chapter 4

CS430 - Computer Architecture William J. Taffe Fall 2002 using slides from. CS61C - Machine Structures Dave Patterson Fall 2000

UC Berkeley CS61C : Machine Structures

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Workloads, Scalability and QoS Considerations in CMP Platforms

Outline Marquette University

CS61C Machine Structures. Lecture 1 Introduction. 8/25/2003 Brian Harvey. John Wawrzynek (Warznek) www-inst.eecs.berkeley.

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

Outline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic

Review. Motivation for Input/Output. What do we need to make I/O work?

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

Computer Architecture. Fall Dongkun Shin, SKKU

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Computer Architecture. Introduction. Lynn Choi Korea University

CS61C : Machine Structures

Performance, Power, Die Yield. CS301 Prof Szajda

UC Berkeley CS61C : Machine Structures

TDT4255 Computer Design. Lecture 1. Magnus Jahre

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure

Lecture 3: Evaluating Computer Architectures. How to design something:

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

UCB CS61C : Machine Structures

CS61C : Machine Structures

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Performance Measurement (as seen by the customer)

UCB CS61C : Machine Structures

CS3350B Computer Architecture CPU Performance and Profiling

Memory Systems IRAM. Principle of IRAM

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX

PERFORMANCE MEASUREMENT

Chapter-5 Memory Hierarchy Design

CS 152 Computer Architecture and Engineering

Lecture 41: Introduction to Reconfigurable Computing

B649 Graduate Computer Architecture. Lec 1 - Introduction

UC Berkeley CS61C : Machine Structures

IEEE Standard 754 for Binary Floating-Point Arithmetic.

UC Berkeley CS61C : Machine Structures

Block Size Tradeoff (1/3) Benefits of Larger Block Size. Lecture #22 Caches II Block Size Tradeoff (3/3) Block Size Tradeoff (2/3)

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

Review : Pipelining. Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

CSE 502 Graduate Computer Architecture. Lec 11 Simultaneous Multithreading

UCB CS61C : Machine Structures

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

A Cross-Architectural Interface for Code Cache Manipulation. Kim Hazelwood and Robert Cohn

ECE 15B COMPUTER ORGANIZATION

CENG4480 Lecture 09: Memory 1

Many Cores, One Thread: Dean Tullsen University of California, San Diego

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

CS61C : Machine Structures

Transcription:

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L27 Performance II & Summary (1) Lecture #27 Performance II & Summary 2005-12-07 There is one handout today at the front and back of the room! Lecturer PSOE, new dad Dan Garcia www.cs.berkeley.edu/~ddgarcia The ultimate gift under the tree?! Since it s the holiday season, it s time to to consider what would be the best gift for a CS61C student. Nothing says I love you like a $2,300 81-game retro system, available today @ Costco. Xbox360 who? www.costco.com/browse/product.aspx?prodid=11098104

RAID Review Motivation: In the 1980s, there were 2 classes of drives: expensive, big for enterprises and small for PCs. They thought make one big out of many small! Higher performance with more disk arms/$, adds option for small # of extra disks (the R) Started @ Cal by CS Profs Katz & Patterson Latency v. Throughput Performance doesn!t depend on any single factor: need Instruction Count, Clocks Per Instruction (CPI) and Clock Rate to get valid estimations User Time: time user waits for program to execute: depends heavily on how OS switches between tasks CPU Time: time spent executing a single program: depends solely on processor design (datapath, pipelining effectiveness, caches, etc.) CPU time = Instructions x Cycles x Seconds Program Instruction Cycle CS61C L27 Performance II & Summary (2)

What Programs Measure for Comparison? Ideally run typical programs with typical input before purchase, or before even build machine Called a workload ; For example: Engineer uses compiler, spreadsheet Author uses word processor, drawing program, compression software In some situations its hard to do Don!t have access to machine to benchmark before purchase Don!t know workload in future Next: benchmarks & PC-Mac showdown! CS61C L27 Performance II & Summary (3)

Benchmarks Obviously, apparent speed of processor depends on code used to test it Need industry standards so that different processors can be fairly compared Companies exist that create these benchmarks: typical code used to evaluate systems Need to be changed every 2 or 3 years since designers could (and do!) target for these standard benchmarks CS61C L27 Performance II & Summary (4)

Example Standardized Benchmarks (1/2) Standard Performance Evaluation Corporation (SPEC) SPEC CPU2000 CINT2000 12 integer (gzip, gcc, crafty, perl,...) CFP2000 14 floating-point (swim, mesa, art,...) All relative to base machine Sun 300MHz 256Mb-RAM Ultra5_10, which gets score of 100 www.spec.org/osg/cpu2000/ They measure - System speed (SPECint2000) - System throughput (SPECint_rate2000) CS61C L27 Performance II & Summary (5)

Example Standardized Benchmarks (2/2) SPEC Benchmarks distributed in source code Members of consortium select workload - 30+ companies, 40+ universities Compiler, machine designers target benchmarks, so try to change every 3 years The last benchmark released was SPEC 2000 - They are still finalizing SPEC 2005 CINT2000 gzip C Compression vpr C FPGA Circuit Placement and Routing gcc C C Programming Language Compiler mcf C Combinatorial Optimization crafty C Game Playing: Chess parser C Word Processing eon C++ Computer Visualization perlbmk C PERL Programming Language gap C Group Theory, Interpreter vortex C Object-oriented Database bzip2 C Compression twolf C Place and Route Simulator CS61C L27 Performance II & Summary (6) CFP2000 wupwise Fortran77 Physics / Quantum Chromodynamics swim Fortran77 Shallow Water Modeling mgrid Fortran77 Multi-grid Solver: 3D Potential Field applu Fortran77 Parabolic / Elliptic Partial Diff Equations mesa C 3-D Graphics Library galgel Fortran90 Computational Fluid Dynamics art C Image Recognition / Neural Networks equake C Seismic Wave Propagation Simulation facerec Fortran90 Image Processing: Face Recognition ammp C Computational Chemistry lucas Fortran90 Number Theory / Primality Testing fma3d Fortran90 Finite-element Crash Simulation sixtrack Fortran77 High Energy Nuclear Physics Accelerator Design apsi Fortran77 Meteorology: Pollutant Distribution

Example PC Workload Benchmark PCs: Ziff-Davis Benchmark Suite Business Winstone is a system-level, application-based benchmark that measures a PC's overall performance when running today's top-selling Windows-based 32-bit applications it doesn't mimic what these packages do; it runs real applications through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores. Also tests for CDs, Content-creation, Audio, 3D graphics, battery life http://www.etestinglabs.com/benchmarks/ CS61C L27 Performance II & Summary (7)

Performance Evaluation Good products created when have: Good benchmarks Good ways to summarize performance Given sales is a function of performance relative to competition, should invest in improving product as reported by performance summary? If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! CS61C L27 Performance II & Summary (8)

Performance Evaluation: The Demo If we!re talking about performance, let!s discuss the ways shady salespeople have fooled consumers (so that you don!t get taken!) 5. Never let the user touch it 4. Only run the demo through a script 3. Run it on a stock machine in which no expense was spared 2. Preprocess all available data 1. Play a movie CS61C L27 Performance II & Summary (9)

PC / PC / Mac Showdown!!! (1/4) PC 1 GHz Pentium III 256 Mb RAM 512KB L2 Cache No L3 133 MHz Bus 20 GB Disk 16MB VRAM PC 800MHz PIII Mac 800 MHz PowerbookG4 1 Gb RAM - 2 512Mb SODIMMs 32KB L1Inst, L1Data 256KB L2 Cache 1Mb L3 Cache 133 MHz Bus 40 GB Disk 32MB VRAM Let!s take a look at SPEC2000 and a simulation of a real-world application. CS61C L27 Performance II & Summary (10)

PC / Mac Showdown!!! (2/4) CFP2000 (bigger better) [left-to-right by G4/PIII 800MHz ratio] CS61C L27 Performance II & Summary (11)

PC / Mac Showdown!!! (3/4) CINT2000 (bigger better) [left-to-right by G4/PIII 800MHz ratio] CS61C L27 Performance II & Summary (12)

PC / Mac Showdown!!! (4/4) Apple got in a heap of trouble when claiming the G5 was the worlds fastest personal computer lies, damn lies, and statistics. Normalized Photoshop radial blur (bigger better) [Amt=10,Zoom,Best](PIII = 79sec = 100, G4= 69sec) CS61C L27 Performance II & Summary (13)

Administrivia If you did well in CS3 or 61{A,B,C} (A- or above) and want to be on staff? Usual path: Lab assistant! Reader! TA Fill in form outside 367 Soda before first week of semester I strongly encourage anyone who gets above a B+ in the class to follow this path Sp04 Final exam + solutions online! Final Review: 2005-12-11 @ 2pm in 10 Evans Final: 2005-12-17 @ 12:30pm in 2050 VLSB Only bring pen{,cil}s, two 8.5 x11 handwritten sheets + green. Leave backpacks, books, calculators, cells & pagers home! CS61C L27 Performance II & Summary (14)

Upcoming Calendar Week # #15 Last Week o! Classes #16 Sun 2pm Review 10 Evans Mon Performance Performance competition due tonight @ midnight Wed LAST CLASS Summary, Review, & HKN Evals Thu Lab I/O Networking & 61C Feedback Survey Sat FINAL EXAM SAT 12-17 @ 12:30pm- 3:30pm 2050 VLSB Performance awards CS61C L27 Performance II & Summary (15)

CS61C: So what's in it for me? (1 st lecture) Learn some of the big ideas in CS & engineering: 5 Classic components of a Computer Principle of abstraction, systems built as layers Data can be anything (integers, floating point, characters): a program determines what it is Stored program concept: instructions just data Compilation v. interpretation thru system layers Principle of Locality, exploited via a memory hierarchy (cache) Greater performance by exploiting parallelism (pipelining) Principles/Pitfalls of Performance Measurement CS61C L27 Performance II & Summary (16)

Conventional Wisdom (CW) in Comp Arch Thanks to Dave Patterson for these Old CW: Power free, Transistors expensive New CW: Power expensive, Transistors free Can put more on chip than can afford to turn on Old CW: Chips reliable internally, errors at pins New CW: " 65 nm! high error rates Old CW: CPU manufacturers minds closed New CW: Power wall + Memory gap = Brick wall New idea-receptive environment Old CW: Uniprocessor performance 2X / 1.5 yrs New CW: 2X CPUs per socket / ~ 2 to 3 years More simpler processors more power efficient CS61C L27 Performance II & Summary (17)

Massively Parallel Socket Processor = new transistor? Does it only help power/cost/performance? Intel 4004 (1971): 4-bit processor, 2312 transistors, 0.4 MHz, 10 µm PMOS, 11 mm 2 chip RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 µm NMOS, 60 mm 2 chip 4004 shrinks to ~ 1 mm 2 at 3 micron 125 mm 2 chip, 65 nm CMOS = 2312 RISC IIs + Icache + Dcache RISC II shrinks to ~ 0.02 mm 2 at 65 nm Caches via DRAM or 1 transistor SRAM (www.t-ram.com)? Proximity Communication at > 1 TB/s? Ivan Sutherland @ Sun spending time in Berkeley! CS61C L27 Performance II & Summary (18)

20th vs. 21st Century IT Targets 20th Century Measure of Success Performance (peak vs. delivered) Cost (purchase cost vs. ownership cost, power) 21st Century Measure of Success? SPUR Security Privacy Usability Reliability Massive parallelism greater chance (this time) if Measure of success is SPUR vs. only cost-perf Uniprocessor performance improvement decelerates CS61C L27 Performance II & Summary (19)

Other Implications Need to revisit chronic unsolved problem Parallel programming!! Implications for applications: Computing power >>> CDC6600, Cray XMP (choose your favorite) on an economical die inside your watch, cell phone or PDA - On your body health monitoring - Google + library of congress on your PDA As devices continue to shrink The need for great HCI critical as ever! CS61C L27 Performance II & Summary (20)

Taking advantage of Cal Opportunities The Godfather answers all of life!s s questions CS61C L27 Performance II & Summary (21) Heard in You!ve got Mail Why are we the #2 Univ in the WORLD? So says the 2004 ranking from the Times Higher Education Supplement Research, reseach, research! Whether you want to go to grad school or industry, you need someone to vouch for you! (as is the case with the Mob) Techniques Find out what you like, do lots of web research (read published papers), hit OH of Prof, show enthusiasm & initiative http://research.berkeley.edu/

Dan!s CS98/198 Opportunities Spring 2006 GamesCrafters (Game Theory R & D) We are developing SW, analysis on small 2-person games of no chance. (e.g., achi, connect-4, dotsand-boxes, etc.) Req: A- in CS61C, Game Theory Interest http://gamescrafters.berkeley.edu MS-DOS X (Mac Student Developers) Learn to program Macintoshes. No requirements (other than Mac, interest) http://msdosx.berkeley.edu UCBUGG (Recreational Graphics) Develop computer-generated images and animations. http://ucbugg.berkeley.edu CS61C L27 Performance II & Summary (22)

Penultimate slide: Thanks to the staff! TAs Head TA Jeremy Huddleston Zhangxi Tan Michael Le Navtej Sadhal Readers Mario Tanev Mark Whitney Thanks to Dave Patterson for these CS61C notes CS61C L27 Performance II & Summary (23)

The Future for Future Cal Alumni What!s The Future? New Millennium Internet, Wireless, Nanotechnology,... Rapid Changes in Technology World!s (2nd) Best Never Give Up! Education The best way to predict the future is to invent it Alan Kay The Future is up to you! CS61C L27 Performance II & Summary (24)