Computing s Energy Problem:

Size: px
Start display at page:

Download "Computing s Energy Problem:"

Transcription

1 Computing s Energy Problem: (and what we can do about it) Mark Horowitz Stanford University horowitz@ee.stanford.edu

2 Everything Has A Computer Inside 2

3 The Reason is Simple: Moore s Law Made Gates Cheap 3

4 Dennard s Scaling Made Them Fast & Low Energy The triple play: Get more gates, 1/L 2 1/ 2 Gates get faster, CV/i Energy per switch CV 2 3 Dennard, JSSC, pp , Oct

5 Our Expectation Cray-1: world s fastest computer Mb memory (50ns cycle time) 40Kb register (6ns cycle time) ~1 million gates (4/5 input NAND) 80MHz clock 115kW In 45nm (30 years later) < 3 mm 2 > 1 GHz ~ 1 W CRAY-1 5

6 Supporting Evidence 6

7 Houston, We Have A Problem 7

8 The Power Limit Watts/mm 2 8

9 Clever Power Increased Because We Were Greedy 10x too large 9

10 This Power Problem Is Not Going Away: P = C * Vdd 2 * f L

11 Think About It 11

12 Technology to the Rescue? 12

13 Problems w/ Replacing CMOS Pretty fundamental physics Avoiding this problem will be hard e- Its capability is pretty amazing fj/gate, 10ps delays, 10 9 working devices 13

14 Catch - 22 Very Different = High Risk Capital you need 14 Building Computers = Large $ Investment Risk

15 The Truth About Innovation Start by creating new markets 15

16 Our CMOS Future Will see tremendous innovative uses of computation Capability of today s technology is incredible Can add computing and communication for nearly $0 Key questions are what problems need to be solved? Most performance system will be energy limited These systems will be optimized for energy efficiency Power = Energy/Op * Ops/sec 16

17 Processor Energy Delay Trade-off 17

18 The Rise of Multi-Core Processors 18

19 The Stagnation of Multi-Core Processors 19

20 Aside: Throughput Optimized FP

21 Aside: Floating Point Optimization 180nm ITRS 10nm 21

22 Have A Shiny Ball, Now What? 22

23 Signal Processing ASICs Dedicated 1000 Energy Efficiency (MOPS/mW) CPUs CPUs+GPUs GP DSPs ~1000x Markovic, EE292 Class, Stanford,

24 The Push For Specialized Hardware 24

25 Before Talking About Specialization 25

26 Don t Forget Memory System Energy 26

27 Processor Energy w/ Corrected Cache Sizes 27

28 Processor Energy Breakdown 8 cores L1/reg/TLB L2 L3 28

29 Data Center Energy Specs Malladi, ISCA,

30 30

31 What Is Going On Here? Dedicated 1000 Energy Efficiency (MOPS/mW) CPUs CPUs+GPUs GP DSPs ~1000x

32 ASIC s Dirty Little Secret All the ASIC applications have absurd locality And work on short integer data Video Frames Inter Prediction Intra Prediction CABAC Entropy Encoder Compressed Bit Stream Integer Motion Estimation Fractional Motion Estimation H % of Execution time is here Hamid et al, ISCA,

33 Rough Energy Numbers (45nm) Integer FP Memory Add FAdd Cache (64bit) 8 bit 0.03pJ 16 bit 0.4pJ 8KB 10pJ 32 bit 0.1pJ 32 bit 0.9pJ 32KB 20pJ Mult FMult 1MB 100pJ 8 bit 0.2pJ 16 bit 1pJ DRAM nJ 32 bit 3 pj 32 bit 4pJ Instruction Energy Breakdown 25pJ 6pJ Control 70 pj I-Cache Access Register File Access Add 33

34 The Truth: It s More About the Algorithm then the Hardware All Algorithms GPU Alg 34

35 35

36 Highly Local Computation Model 36

37 Highly Local Computation Model 37

38 Highly Local Computation Model 38

39 Highly Local Computation Model 39

40 Compose These Cores into a Pipeline Program in space, not time Makes building programmable hardware more difficult

41 Working on System to Explore This Space Takes high-level program Graph of stencil kernels Maps to hardware level assembly Compute graph of operations for each kernel Currently we map the result to: FPGA, custom ASIC 41

42 Enabling Innovation You don t just compile applications to efficiency Need to tweak the application to fit constraints Need to enable application experts to play They know how to cheat and still get good results 42

43 Remember This Trade-off? Investment Risk Need to reduce cost to play Building constructors, not instances Capital you need 43

44 Chip Constructor Idea Software Simulator Per application parameters Generator +Optimizer RTL, Performance Verif Collateral, Power Firmware Usage

45 Not All Systems Are On The Bleeding Edge 45

46 App Store For Hardware 46

47 Challenge 47

48 A New Hope If technology is scaling more slowly We can incorporate current design knowledge into tools To create extensible system constructors If killer products are going to be application driven Application experts need to design them We can leverage the 1 st bullet to enable the 2 nd To usher in a new wave of innovative computing products 48

MOS High Performance Arithmetic

MOS High Performance Arithmetic MOS High Performance Arithmetic Mark Horowitz Stanford University horowitz@ee.stanford.edu Arithmetic Is Important Then Now (TegraK1) 2 What Is Hard? 9999999 + 1 3 Proof: 1 st Gen 2 nd Gen 4 And Getting

More information

Understanding Sources of Inefficiency in General-Purpose Chips. Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE

Understanding Sources of Inefficiency in General-Purpose Chips. Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE Understanding Sources of Inefficiency in General-Purpose Chips Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE 1 Outline Motivation H.264 Basics Key ideas Implementation & Evaluation Summary

More information

Understanding Sources of Inefficiency in General-Purpose Chips

Understanding Sources of Inefficiency in General-Purpose Chips Understanding Sources of Inefficiency in General-Purpose Chips Rehan Hameed Wajahat Qadeer Megan Wachs Omid Azizi Alex Solomatnikov Benjamin Lee Stephen Richardson Christos Kozyrakis Mark Horowitz GP Processors

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

Computer Architecture s Changing Definition

Computer Architecture s Changing Definition Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction

More information

Computer Architecture. What is it?

Computer Architecture. What is it? Computer Architecture Venkatesh Akella EEC 270 Winter 2005 What is it? EEC270 Computer Architecture Basically a story of unprecedented improvement $1K buys you a machine that was 1-5 million dollars a

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Boris Grot and Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh General Information Instructors: Boris

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain. Defining Performance Performance 1 Which airplane has the best performance? Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 100 200 300

More information

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming Systems Design and Programming Instructor: Professor Jim Plusquellic Text: Barry B. Brey, The Intel Microprocessors, 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium and Pentium Pro Processor Architecture,

More information

EMBEDDED VISION AND 3D SENSORS: WHAT IT MEANS TO BE SMART

EMBEDDED VISION AND 3D SENSORS: WHAT IT MEANS TO BE SMART EMBEDDED VISION AND 3D SENSORS: WHAT IT MEANS TO BE SMART INTRODUCTION Adding embedded processing to simple sensors can make them smart but that is just the beginning of the story. Fixed Sensor Design

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

Higher Level Programming Abstractions for FPGAs using OpenCL

Higher Level Programming Abstractions for FPGAs using OpenCL Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*

More information

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Classes of Computers Personal computers General purpose, variety of software

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

H.264 AVC 4k Decoder V.1.0, 2014

H.264 AVC 4k Decoder V.1.0, 2014 SOC H.264 AVC 4k Video Decoder Datasheet System-On-Chip (SOC) Technologies 1. Key Features 1. Profile: High profile 2. Resolution: 4k (3840x2160) 3. Frame Rate: up to 60fps 4. Chroma Format: 4:2:0 or 4:2:2

More information

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, ) Performance IC220 Slide Set #5B: Performance (Chapter 1: 1.6, 1.9-1.11) Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational

More information

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747 Defining Which airplane has the best performance? 1 Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 100 200 300 400 500 Passenger Capacity

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

EE382 Processor Design. Class Objectives

EE382 Processor Design. Class Objectives EE382 Processor Design Stanford University Winter Quarter 1998-1999 Instructor: Michael Flynn Teaching Assistant: Steve Chou Administrative Assistant: Susan Gere Lecture 1 - Introduction Slide 1 Class

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Enabling the A.I. Era: From Materials to Systems

Enabling the A.I. Era: From Materials to Systems Enabling the A.I. Era: From Materials to Systems Sundeep Bajikar Head of Market Intelligence, Applied Materials New Street Research Conference May 30, 2018 External Use Key Message PART 1 PART 2 A.I. *

More information

The Memory Hierarchy 1

The Memory Hierarchy 1 The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow

More information

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers

More information

Memory Hierarchy Y. K. Malaiya

Memory Hierarchy Y. K. Malaiya Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview : Computer Architecture and Organization Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ : Computer Architecture and Organization -- Course Overview Goals»

More information

Quiz for Chapter 1 Computer Abstractions and Technology

Quiz for Chapter 1 Computer Abstractions and Technology Date: Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

ELCT 501: Digital System Design

ELCT 501: Digital System Design ELCT 501: Digital System Lecture 1: Introduction Dr. Mohamed Abd El Ghany, Mohamed.abdel-ghany@guc.edu.eg Administrative Rules Course components: Lecture: Thursday (fourth slot), 13:15-14:45 (H8) Office

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits

EE241 - Spring 2004 Advanced Digital Integrated Circuits EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.

More information

Performance, Power, Die Yield. CS301 Prof Szajda

Performance, Power, Die Yield. CS301 Prof Szajda Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!

More information

Software Defined Hardware

Software Defined Hardware Software Defined Hardware For data intensive computation Wade Shen DARPA I2O September 19, 2017 1 Goal Statement Build runtime reconfigurable hardware and software that enables near ASIC performance (within

More information

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems Computer Architecture Review ICS332 - Spring 2016 Operating Systems ENIAC (1946) Electronic Numerical Integrator and Calculator Stored-Program Computer (instead of Fixed-Program) Vacuum tubes, punch cards

More information

Arm s First-Generation Machine Learning Processor

Arm s First-Generation Machine Learning Processor Arm s First-Generation Machine Learning Processor Ian Bratt 2018 Arm Limited Introducing the Arm Machine Learning (ML) Processor Optimized ground-up architecture for machine learning processing Massive

More information

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy Competitors using generic parts Performance benefits to be had for custom design Original PlayStation: no vector processing or floating point support Geometry issues Photorealism at the core of design

More information

CS 110 Computer Architecture

CS 110 Computer Architecture CS 110 Computer Architecture Performance and Floating Point Arithmetic Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor

More information

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of

More information

A236 Video DSP Chip Evaluation Board #2 with FingerPrint Option

A236 Video DSP Chip Evaluation Board #2 with FingerPrint Option A236 Video DSP Chip Evaluation Board #2 with FingerPrint Option Using Oxford Micro Devices, Inc.'s A236 Video DSP Chip, Rev. March 8, 1999 Features the System Designer's Parallel DSP Chip (TM) Data Sheet

More information

Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005

More information

The Role of Performance

The Role of Performance Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture The Role of Performance What is performance? A set of metrics that allow us to compare two different hardware

More information

Computer Architecture. R. Poss

Computer Architecture. R. Poss Computer Architecture R. Poss 1 ca01-10 september 2015 Course & organization 2 ca01-10 september 2015 Aims of this course The aims of this course are: to highlight current trends to introduce the notion

More information

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

Systems Design and Programming. Instructor: Chintan Patel

Systems Design and Programming. Instructor: Chintan Patel Systems Design and Programming Instructor: Chintan Patel Text: Barry B. Brey, 'The Intel Microprocessors, 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium and Pentium Pro Processor, Pentium II, Pentium

More information

Specializing Hardware for Image Processing

Specializing Hardware for Image Processing Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 15 LAST TIME: CACHE ORGANIZATION Caches have several important parameters B = 2 b bytes to store the block in each cache line S = 2 s cache sets

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 35 Caches IV / VM I 2004-11-19 Andy Carle inst.eecs.berkeley.edu/~cs61c-ta Google strikes back against recent encroachments into the Search

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

Comparing Memory Systems for Chip Multiprocessors

Comparing Memory Systems for Chip Multiprocessors Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

Computer Architecture. Fall Dongkun Shin, SKKU

Computer Architecture. Fall Dongkun Shin, SKKU Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses

More information

Don t Think You Need an FPGA? Think Again!

Don t Think You Need an FPGA? Think Again! 1 Don t Think You Need an FPGA? Think Again! Arun Veeramani Senior Program Manager National Instruments Don t Think You Need an FPGA? Think Again! Goals for Today Define and explain FPGAs Address common

More information

CONSOLE ARCHITECTURE

CONSOLE ARCHITECTURE CONSOLE ARCHITECTURE Introduction Part 1 What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design What

More information

EE5780 Advanced VLSI CAD

EE5780 Advanced VLSI CAD EE5780 Advanced VLSI CAD Lecture 1 Introduction Zhuo Feng 1.1 Prof. Zhuo Feng Office: EERC 513 Phone: 487-3116 Email: zhuofeng@mtu.edu Class Website http://www.ece.mtu.edu/~zhuofeng/ee5780fall2013.html

More information

Technology Platform Segmentation

Technology Platform Segmentation HOW TECHNOLOGY R&D LEADERSHIP BRINGS A COMPETITIVE ADVANTAGE FOR MULTIMEDIA CONVERGENCE Technology Platform Segmentation HP LP 2 1 Technology Platform KPIs Performance Design simplicity Power leakage Cost

More information

Hakam Zaidan Stephen Moore

Hakam Zaidan Stephen Moore Hakam Zaidan Stephen Moore Outline Vector Architectures Properties Applications History Westinghouse Solomon ILLIAC IV CDC STAR 100 Cray 1 Other Cray Vector Machines Vector Machines Today Introduction

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

Lecture 1: Welcome, why are you here? James C. Hoe Department of ECE Carnegie Mellon University

Lecture 1: Welcome, why are you here? James C. Hoe Department of ECE Carnegie Mellon University 18 643 Lecture 1: Welcome, why are you here? James C. Hoe Department of ECE Carnegie Mellon University 18 643 F17 L01 S1, James C. Hoe, CMU/ECE/CALCM, 2017 18 643 F17 L01 S2, James C. Hoe, CMU/ECE/CALCM,

More information

LAB 9 The Performance of MIPS

LAB 9 The Performance of MIPS LAB 9 The Performance of MIPS Goals Learn how the performance of the processor is determined. Improve the processor performance by adding new instructions. To Do Determine the speed of the processor in

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

AMD Opteron 4200 Series Processor

AMD Opteron 4200 Series Processor What s new in the AMD Opteron 4200 Series Processor (Codenamed Valencia ) and the new Bulldozer Microarchitecture? Platform Processor Socket Chipset Opteron 4000 Opteron 4200 C32 56x0 / 5100 (codenamed

More information

Instructor Information

Instructor Information CS 203A Advanced Computer Architecture Lecture 1 1 Instructor Information Rajiv Gupta Office: Engg.II Room 408 E-mail: gupta@cs.ucr.edu Tel: (951) 827-2558 Office Times: T, Th 1-2 pm 2 1 Course Syllabus

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 15

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 15 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2015 Lecture 15 LAST TIME! Discussed concepts of locality and stride Spatial locality: programs tend to access values near values they have already accessed

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Reminder Course project team forming deadline Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Course project ideas If you have difficulty in finding team mates, send your

More information

Parallelism: The Real Y2K Crisis. Darek Mihocka August 14, 2008

Parallelism: The Real Y2K Crisis. Darek Mihocka August 14, 2008 Parallelism: The Real Y2K Crisis Darek Mihocka August 14, 2008 The Free Ride For decades, Moore's Law allowed CPU vendors to rely on steady clock speed increases: late 1970's: 1 MHz (6502) mid 1980's:

More information

Scaling Throughput Processors for Machine Intelligence

Scaling Throughput Processors for Machine Intelligence Scaling Throughput Processors for Machine Intelligence ScaledML Stanford 24-Mar-18 simon@graphcore.ai 1 MI The impact on humanity of harnessing machine intelligence will be greater than the impact of harnessing

More information

Transistor: Digital Building Blocks

Transistor: Digital Building Blocks Final Exam Review Transistor: Digital Building Blocks Logically, each transistor acts as a switch Combined to implement logic functions (gates) AND, OR, NOT Combined to build higher-level structures Multiplexer,

More information

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs October 13, 2009 Overview Presenting: Alex Papakonstantinou, Karthik Gururaj, John Stratton, Jason Cong, Deming Chen, Wen-mei Hwu. FCUDA:

More information

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

On the Capability and Achievable Performance of FPGAs for HPC Applications "On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies

More information

CS 152, Spring 2011 Section 10

CS 152, Spring 2011 Section 10 CS 152, Spring 2011 Section 10 Christopher Celio University of California, Berkeley Agenda Stuff (Quiz 4 Prep) http://3dimensionaljigsaw.wordpress.com/2008/06/18/physics-based-games-the-new-genre/ Intel

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example 1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles

More information

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 17 Introduction to Memory Hierarchies Why it s important  Fundamental lesson(s) Suggested reading: (HP Chapter Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "

More information

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs October 13, 2009 Overview Presenting: Alex Papakonstantinou, Karthik Gururaj, John Stratton, Jason Cong, Deming Chen, Wen-mei Hwu. FCUDA:

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important? 0.1 What is This Course About? 0.2 CS 356 Unit 0 Class Introduction Basic Hardware Organization Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details

More information

ECE 152 Introduction to Computer Architecture

ECE 152 Introduction to Computer Architecture Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course

More information

Multi-Processors and GPU

Multi-Processors and GPU Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock

More information

The Stanford Hydra CMP. Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun

The Stanford Hydra CMP. Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun The Stanford Hydra CMP Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun Computer Systems Laboratory Stanford University http://www-hydra.stanford.edu

More information

LECTURE 1. Introduction

LECTURE 1. Introduction LECTURE 1 Introduction CLASSES OF COMPUTERS When we think of a computer, most of us might first think of our laptop or maybe one of the desktop machines frequently used in the Majors Lab. Computers, however,

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

CS5222 Advanced Computer Architecture. Lecture 1 Introduction

CS5222 Advanced Computer Architecture. Lecture 1 Introduction CS5222 Advanced Computer Architecture Lecture 1 Introduction Overview Teaching Staff Introduction to Computer Architecture History Future / Trends Significance The course Content Workload Administrative

More information

Architectural Musings

Architectural Musings Architectural Musings Rethinking Computer Systems Architecture & Evaluation Christopher Vick cvick@qti.qualcomm.com March 23, 2014 1 Introduction Vision Talk How should we analyze, reason about and evaluate

More information

Computer Systems are Different! Robert Morris and Frans Kaashoek Spring 2009

Computer Systems are Different! Robert Morris and Frans Kaashoek Spring 2009 Computer Systems are Different! Robert Morris and Frans Kaashoek 6.033 Spring 2009 Outline hidden All systems are similar But computer systems are different Unbounded composability Easy to achieve complexity

More information

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF.

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems Liang-Gee Chen Distinguished Professor General Director, SOC Center National Taiwan University DSP/IC Design Lab, GIEE, NTU 1

More information

U-Boot Falcon Mode. Minimizing boot times using U-Boot "Falcon" mode. Stefano Babic / Wolfgang Denk. July 2012

U-Boot Falcon Mode. Minimizing boot times using U-Boot Falcon mode. Stefano Babic / Wolfgang Denk. July 2012 U-Boot Falcon Mode Minimizing boot times using U-Boot "Falcon" mode Stefano Babic / Wolfgang Denk July 2012 Overview Requirements for Boot Loaders Frequently Asked For Optimizations: Boot Time Hardware

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information