Approximate Overview of Approximate Computing
|
|
- Nelson Ball
- 5 years ago
- Views:
Transcription
1 Approximate Overview of Approximate Computing Luis Ceze University of Washington PL Architecture With thanks to many colleagues from whom I stole slides: Adrian Sampson, Hadi Esmaeilzadeh, Karin Strauss, Mark Wyse, James Bornholt,
2 Moore s law gives us lots of transistors on a chip. But it is Dennard scaling that lets us use them: 2x transistor count, 40% faster, 50% more efficient 10 years Dark Silicon 45 nm 32 nm 22 nm 16 nm 11 nm 8 nm 1% 17% 36% 40% 51% 2
3 Specialization to the rescue
4 A storage gap? Is it inevitable? Disk cost-per-byte is not decreasing fast enough Information growth [Credit: David Rosenthal (CMU) and Preeti Gupta (UCSC), 2014] [Credit: EMC 2012]
5 Modern Applications image, sound and video processing image rendering sensor data analysis, computer vision simulations, games, search, machine learning Inexact input data Approximate/iterative algorithms Malleable output
6 ASPLOS Wild and Crazy Ideas 2008
7 What is approximate computing? ~ Exploit inherent application-level resilience to build more efficient/better computers systems. Application Efficiency and performance Output accuracy Physics In essence, goal is to specialize computation, storage and communication to properties of the data and the algorithm. Enables better use of underlying substrate.
8 Wait, what about :) Algorithms Machine learning Iterative algorithms Lossy compression Floating point Language Compiler ISA/Architecture Circuits Physics Reasoning about approximation in PL Approximate compiler optimizations ~2-3X Approximate execution models ~10X Non-deterministic near/sub threshold HW ~5X Big opportunities when going non-deterministic. Unsafe HW operation (timing, Vdd) Analog hardware (closer to physics) ~5X ~10-100X+
9 HW/SW co-design is essential Approximation just at the hardware level isn t safe. Approximation just at the algorithm level is suboptimal. Assuming reliable hardware for inherently robust algorithms is a waste.
10 Three important questions 1 2 What and how to approximate? How good is my output? Language Compiler Runtime 3 How to take advantage of it? Hardware
11 What and how to approximate? Language All pieces of a computation and data are not equivalent (some aspects need to be precise, others can be approximate) How to take advantage of approximation without compromising important system int a int p =...; What are the language semantics? Data-centric or code-centric?
12 How good is my output? Metric: Quality-of-Result (QoR) Application dependent, provided by programmer e.g, % of bad pixels, deviation from expected value, % of poorly classified images, car crashes, etc
13 Checking quality res = computesomething(); assert diff(res, resʹ) < 0.1; Compiler Runtime Hardware precise version of the result Check statically as much as possible But, yes, it often needs a dynamic component. Needs to be cheap!
14 How to take advantage of approximation? Compiler Runtime Precision tuning. Loop perforation. Synchronization elision. Approximate parallelization.
15 How to take advantage of approximation? Hardware Approximate functional units, data path, registers, caches, memory. CPU Approximate accelerators. CPU Acc Approximate on-chip interconnect? Mixed-mode functional units?
16 Amdahl s law... damn! Fetch Decode Reg Read Execute Memory Write Back Branch Predictor Integer FU Instruction Cache Decoder Register File Data Cache Register File ITLB FP FU DTLB Benefit limited to what can be approximated Instruction control can not be approximated
17 Neural acceleration [Esmaeilzadeh et al.] Find an approximate program component Program Compile the program and train a neural network
18 Neural acceleration [Esmaeilzadeh et al.] Find an approximate program component Program Compile the program and train a neural network Execute on a fast Neural Processing Unit (NPU) CPU NPU
19 Summary of NPU results application domain error metric blackscholes option pricing MSE fft DSP MSE inversek2j robotics MSE jmeint 3D-modeling miss rate jpeg compression image diff kmeans ML image diff sobel vision image diff 0.9x - 24x (3.7x mean) speedup 1.5x - 51x (6.8x mean) energy red. CPU NPU F D X I M C CPU FP G A 0.8x x (3x mean) speedup 1.1x - 21x (3x mean) energy red. 1.3x - 38x (3.8x mean) speedup 0.9x - 28x (2.8x mean) energy red.
20 A taxonomy of approximation techniques (not exhaustive J) Nondeterministic Deterministic Fine Grained DRAM Refresh Rate SRAM Soft Error Exposure Approximate Storage (PCM) Synchronization Elision Voltage Overscaling Mixed-mode functional units Bit-Width Reduction Precision Scaling ALU Hierarchical FPU Float-to-Fixed Conversion Reduced-Precision FPU Underdesigned Multiplier Lossy Compression and Data Packing Load Value Approximation Coarse Grained Error Prone Processors Neural Acceleration (Analog) Code Perforation Fuzzy/Interpolated Memoization Neural Acceleration (ASIC, FPGA, GPU) Parallel Pattern Replacement
21 Approximation beyond the CPU [MICRO 13] Multi-level solid state cells Wireless Network Disk Display I/O high high low 00 low probability Fast Dense probability Compute Storage Accurate Memory
22 Code with Approx Specs + quality metric
23 10k-feet challenges Abstractions for hardware and software Specifying and guaranteeing QoR Subjective nature of quality Programmer cognitive load Composability Of hardware and software Debugging and testing correctness and performance Algorithmic transformations to enable effective approximation Avoiding Amdahl s law effect E.g., applying to data-path, or processor only is not sufficient
24 Recent efforts Tools/HCI PL Relyzer(UIUC), Debugging (UW) User perception assessment (GAtech, Cornell, UW) EnerJ (UW), Passert (MSR/UW), Rely/Chisel (MIT), Relax (Wisconsin) Uncertain<T> (MSR), Eon (UMass), FlexJava (GATech), Approx HDLs (GATech), Approx synthesis (UW), Variablity-aware software (UCSD) Compiler Runtime OS/DB Architecture Hardware Unsound transformations (MIT, UW), Synchronization Elision (IBM, UW). Green (MSR), Topaz (MIT). PowerDial (MIT), Soft error control (UCLA), SAGE & Paraprox (Michigan), Swat (UIUC), JouleGuard (Chicago), Approx parallelization (Harvard), Task-based models (MIT) BlinkDB (Berkeley/MIT), Approx Paxos (UW), Sensor Device Drivers (MIT) CMOS resilience awareness (Stanford) ANNs (UW, MSR, INRIA, Wisconsin, Qualcomm, IBM) Using Neural Nets for code approximation (GAtech/UW/MSR) Decoupled control/data plane (Minesotta) Stochastic Processors (UIUC), ERSA (Stanford), Flikker (MSR/UBC), QUORA (Purdue), Approximate Storage (MSR, UW) Probabilistic CMOS (Rice), Approximate components (Purdue), Approximate functional units (Wisc/UIUC)
25
26 Lots to learn from other communities DSP/Embedded systems Signals are all about approximation Machine learning Deals with quality issues inherently Numerical analysis Deterministic approximation is at its heart
27 Approximate vs Probabilistic computing Approximate: relaxing accuracy Probabilistic: computing over probabilities/distributions Orthogonal but synergistic! Reasoning about uncertainty in approximate programs Approximate evaluation of probabilistic models
28 Approximate Computing Probabilistic Programming Verifies? Verifies against model Probabilistic Program Analysis Probabilistic Model Checking
29 Super-relevant to exciting substrates gattaca DNA synthesis DNA sequencing Hyper-Dense 1 ZB/cm 3 (~1E8 denser than Flash) Hyper-Durable We find readable ~100k-year-old DNA Eternally relevant As long as there is DNA-based intelligent life, there will be reasons to read/write DNA
30 QUESTIONS?~
31 How will approximate computing fail? Applications can t take advantage of approximation opportunities Programmers aren t able to write/debug/test approximate code Quality assurance problems Marketing reasons: buy my flaky system!
32 EnerJ/DECAF safe approximate programming [PLDI 2011, OOPSLA 15] Quality Assurance monitoring, testing, verification & debugging [PLDI 2014, ASPLOS 15] Approximate Wireless recover waste from comm errors [arxiv] Approximate ISA and uarch [ASPLOS 2012] Language Compiler ACCEPT an approximate compiler OS & Networking Neural Architecture Acceleration [MICRO 2012, Circuits ISCA 2014 HPCA 15] Approximate Storage exploiting analog properties of PCM [MICRO 2013]
Approximate Computing Is Dead; Long Live Approximate Computing. Adrian Sampson Cornell
Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware Programming Quality Domains Hardware Programming No more approximate functional units. Quality Domains Narrower
More informationBridging Analog Neuromorphic and Digital von Neumann Computing
Bridging Analog Neuromorphic and Digital von Neumann Computing Amir Yazdanbakhsh, Bradley Thwaites Advisors: Hadi Esmaeilzadeh and Doug Burger Qualcomm Mentors: Manu Rastogiand Girish Varatkar Alternative
More informationLuis Ceze. sa pa. areas: computer architecture, OS, programming languages. Safe MultiProcessing Architectures at the University of Washington
Luis Ceze areas: computer architecture, OS, programming languages sa pa Safe MultiProcessing Architectures at the University of Washington Safe and General Energy-Aware Programming with Disciplined Approximation
More informationCompilation and Hardware Support for Approximate Acceleration
Compilation and Hardware Support for Approximate Acceleration Thierry Moreau Adrian Sampson Andre Baixo Mark Wyse Ben Ransford Jacob Nelson Luis Ceze Mark Oskin University of Washington Abstract Approximate
More informationApproximate Program Synthesis
Approximate Program Synthesis James Bornholt Emina Torlak Luis Ceze Dan Grossman University of Washington Writing approximate programs is hard Precise Implementation Writing approximate programs is hard
More informationHardware Software Co-Design: Not Just a Cliché
Hardware Software Co-Design: Not Just a Cliché Adrian Sampson James Bornholt Luis Ceze University of Washington SNAPL 2015 sa pa time immemorial 2005 2015 (not to scale) free lunch time immemorial 2005
More informationArchitecture Support for Disciplined Approximate Programming
Architecture Support for Disciplined Approximate Programming Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, Doug Burger University of Washington, Microsoft Research Presented by: Lucy Jiang, Cristina Garbacea
More informationArchitecture at the end of Moore
Architecture at the end of Moore Stefanos Kaxiras Uppsala University IT Uppsala universitet Conclusions There s a power problem and it seems bad Nothing works really well (e.g., multicores) Heterogeous
More informationNeural Network based Energy-Efficient Fault Tolerant Architect
Neural Network based Energy-Efficient Fault Tolerant Architectures and Accelerators University of Rochester February 7, 2013 References Flexible Error Protection for Energy Efficient Reliable Architectures
More informationMicroprocessor Trends and Implications for the Future
Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from
More informationEE282 Computer Architecture. Lecture 1: What is Computer Architecture?
EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer
More informationCore. Error Predictor. Figure 1: Architectural overview of our quality control approach. Approximate Accelerator. Precise.
Prediction-Based Quality Control for Approximate Accelerators Divya Mahajan Amir Yazdanbakhsh Jongse Park Bradley Thwaites Hadi Esmaeilzadeh Georgia Institute of Technology Abstract Approximate accelerators
More informationUnit 11: Putting it All Together: Anatomy of the XBox 360 Game Console
Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture
More informationRobust System Design with MPSoCs Unique Opportunities
Robust System Design with MPSoCs Unique Opportunities Subhasish Mitra Robust Systems Group Departments of Electrical Eng. & Computer Sc. Stanford University Email: subh@stanford.edu Acknowledgment: Stanford
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationGreenDroid: An Architecture for the Dark Silicon Age
GreenDroid: An Architecture for the Dark Silicon Age Nathan Goulding-Hotta, Jack Sampson, Qiaoshi Zheng, Vikram Bhatt, Joe Auricchio, Steven Swanson, Michael Bedford Taylor University of California, San
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationEmbedded Systems: Hardware Components (part I) Todor Stefanov
Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System
More informationMaximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman
Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael
More informationMore Course Information
More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationAdaptable Intelligence The Next Computing Era
Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion
More informationThe Memory Hierarchy 1
The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationUniversal Parallel Computing Research Center at Illinois
Universal Parallel Computing Research Center at Illinois Making parallel programming synonymous with programming Marc Snir 08-09 The UPCRC@ Illinois Team BACKGROUND 3 Moore s Law Pre 2004 Number of transistors
More informationComputer Architecture s Changing Definition
Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction
More informationCourse web site: teaching/courses/car. Piazza discussion forum:
Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start
More informationThe HammerBlade: An ML-Optimized Supercomputer for ML and Graphs
The HammerBlade: An ML-Optimized Supercomputer for ML and Graphs Prof. Michael B. Taylor (PI) University of Washington Prof. Adrian Sampson Cornell University Prof. Luis Ceze University of Washington Prof.
More informationReconfigurable Computing. Introduction
Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally
More informationComputer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:
More informationEITF20: Computer Architecture Part2.1.1: Instruction Set Architecture
EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationApproximate Computing on Programmable SoCs via Neural Acceleration
University of Washington Computer Science and Engineering Technical Report UW-CSE-14-05-01 Approximate Computing on Programmable SoCs via Neural Acceleration Thierry Moreau Jacob Nelson Adrian Sampson
More informationComputer Architecture. R. Poss
Computer Architecture R. Poss 1 ca01-10 september 2015 Course & organization 2 ca01-10 september 2015 Aims of this course The aims of this course are: to highlight current trends to introduce the notion
More informationMohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu
Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly
More information7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.
Technology in Action Technology in Action Chapter 9 Behind the Scenes: A Closer Look a System Hardware Chapter Topics Computer switches Binary number system Inside the CPU Cache memory Types of RAM Computer
More informationECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]
ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance
More informationGables: A Roofline Model for Mobile SoCs
Gables: A Roofline Model for Mobile SoCs Mark D. Hill, Wisconsin & Former Google Intern Vijay Janapa Reddi, Harvard & Former Google Intern HPCA, Feb 2019 Outline Motivation Gables Model Example Balanced
More informationEE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I
EE382 (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Spring 2015
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationDHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering
DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY Department of Computer science and engineering Year :II year CS6303 COMPUTER ARCHITECTURE Question Bank UNIT-1OVERVIEW AND INSTRUCTIONS PART-B
More informationElectronic Control systems are also: Members of the Mechatronic Systems. Control System Implementation. Printed Circuit Boards (PCBs) - #1
Control System Implementation Hardware implementation Electronic Control systems are also: Members of the Mechatronic Systems Concurrent design (Top-down approach?) Mechanic compatibility Solve the actual
More information+ Random-Access Memory (RAM)
+ Memory Subsystem + Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally a cell (one bit per cell). Multiple RAM chips form a memory. RAM comes
More informationComputing s Energy Problem:
Computing s Energy Problem: (and what we can do about it) Mark Horowitz Stanford University horowitz@ee.stanford.edu Everything Has A Computer Inside 2 The Reason is Simple: Moore s Law Made Gates Cheap
More informationNeural Acceleration for General-Purpose Approximate Programs
2012 IEEE/ACM 45th Annual International Symposium on Microarchitecture Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian Sampson Luis Ceze Doug Burger University of
More informationOverview of Microcontroller and Embedded Systems
UNIT-III Overview of Microcontroller and Embedded Systems Embedded Hardware and Various Building Blocks: The basic hardware components of an embedded system shown in a block diagram in below figure. These
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationSilicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly)
Memories and SRAM 1 Silicon Memories Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap Dense -- The smaller the bits, the less area you need,
More informationChapter 9: A Closer Look at System Hardware
Chapter 9: A Closer Look at System Hardware CS10001 Computer Literacy Chapter 9: A Closer Look at System Hardware 1 Topics Discussed Digital Data and Switches Manual Electrical Digital Data Representation
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationComputer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13
Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,
More informationHigher Level Programming Abstractions for FPGAs using OpenCL
Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*
More informationChapter 9: A Closer Look at System Hardware 4
Chapter 9: A Closer Look at System Hardware CS10001 Computer Literacy Topics Discussed Digital Data and Switches Manual Electrical Digital Data Representation Decimal to Binary (Numbers) Characters and
More informationECE 571 Advanced Microprocessor-Based Design Lecture 22
ECE 571 Advanced Microprocessor-Based Design Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2018 HW#11 will be posted Announcements 1 Reading 1 Exploring DynamIQ
More informationMoore s Law: Alive and Well. Mark Bohr Intel Senior Fellow
Moore s Law: Alive and Well Mark Bohr Intel Senior Fellow Intel Scaling Trend 10 10000 1 1000 Micron 0.1 100 nm 0.01 22 nm 14 nm 10 nm 10 0.001 1 1970 1980 1990 2000 2010 2020 2030 Intel Scaling Trend
More information컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision
1 컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision 김종남 Application Engineer 2017 The MathWorks, Inc. 2 Three Main Topics New capabilities for computer vision system design: Deep Learning 3-D Vision
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationEyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks Yu-Hsin Chen 1, Joel Emer 1, 2, Vivienne Sze 1 1 MIT 2 NVIDIA 1 Contributions of This Work A novel energy-efficient
More informationQuiz for Chapter 1 Computer Abstractions and Technology
Date: Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,
More informationCSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering
CSE 141: Computer 0 Architecture Professor: Michael Taylor RF UCSD Department of Computer Science & Engineering Computer Architecture from 10,000 feet foo(int x) {.. } Class of application Physics Computer
More informationCourse II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan
Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 12 Caches I Lecturer SOE Dan Garcia Midterm exam in 3 weeks! A Mountain View startup promises to do Dropbox one better. 10GB free storage,
More informationComputer Architecture
Computer Architecture Lecture 2: Fundamental Concepts and ISA Dr. Ahmed Sallam Based on original slides by Prof. Onur Mutlu What Do I Expect From You? Chance favors the prepared mind. (Louis Pasteur) كل
More informationCourse Overview Revisited
Course Overview Revisited void blur_filter_3x3( Image &in, Image &blur) { // allocate blur array Image blur(in.width(), in.height()); // blur in the x dimension for (int y = ; y < in.height(); y++) for
More informationLecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor
Lecture 12 Architectures for Low Power: Transmeta s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important
More informationECE 471 Embedded Systems Lecture 2
ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationChapter 1: Fundamentals of Quantitative Design and Analysis
1 / 12 Chapter 1: Fundamentals of Quantitative Design and Analysis Be careful in this chapter. It contains a tremendous amount of information and data about the changes in computer architecture since the
More informationSpecializing Hardware for Image Processing
Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.
More informationChallenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008
Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated
More informationPACE: Power-Aware Computing Engines
PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious
More informationGraphics Hardware 2008
AMD Smarter Choice Graphics Hardware 2008 Mike Mantor AMD Fellow Architect michael.mantor@amd.com GPUs vs. Multi-core CPUs On a Converging Course or Fundamentally Different? Many Cores Disruptive Change
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationPerformance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of
More informationControl System Implementation
Control System Implementation Hardware implementation Electronic Control systems are also: Members of the Mechatronic Systems Concurrent design (Top-down approach?) Mechanic compatibility Solve the actual
More informationKeywords and Review Questions
Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More information6 February Parallel Computing: A View From Berkeley. E. M. Hielscher. Introduction. Applications and Dwarfs. Hardware. Programming Models
Parallel 6 February 2008 Motivation All major processor manufacturers have switched to parallel architectures This switch driven by three Walls : the Power Wall, Memory Wall, and ILP Wall Power = Capacitance
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationAR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance
More informationChapter 1. Computer Abstractions and Technology
Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications feasible Computers in automobiles Cell phones
More informationChoosing a Micro for an Embedded System Application
Choosing a Micro for an Embedded System Application Dr. Manuel Jiménez DSP Slides: Luis Francisco UPRM - Spring 2010 Outline MCU Vs. CPU Vs. DSP Selection Factors Embedded Peripherals Sample Architectures
More informationEE382 Processor Design. Class Objectives
EE382 Processor Design Stanford University Winter Quarter 1998-1999 Instructor: Michael Flynn Teaching Assistant: Steve Chou Administrative Assistant: Susan Gere Lecture 1 - Introduction Slide 1 Class
More informationCIT 668: System Architecture
CIT 668: System Architecture Computer Systems Architecture I 1. System Components 2. Processor 3. Memory 4. Storage 5. Network 6. Operating System Topics Images courtesy of Majd F. Sakr or from Wikipedia
More informationDNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses
DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,
More informationEfficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford
Efficiency and Programmability: Enablers for ExaScale Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Scientific Discovery and Business Analytics Driving an Insatiable
More informationHardware-Software Codesign. 1. Introduction
Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2
More informationGRE Architecture Session
GRE Architecture Session Session 2: Saturday 23, 1995 Young H. Cho e-mail: youngc@cs.berkeley.edu www: http://http.cs.berkeley/~youngc Y. H. Cho Page 1 Review n Homework n Basic Gate Arithmetics n Bubble
More informationIntroducing Multi-core Computing / Hyperthreading
Introducing Multi-core Computing / Hyperthreading Clock Frequency with Time 3/9/2017 2 Why multi-core/hyperthreading? Difficult to make single-core clock frequencies even higher Deeply pipelined circuits:
More informationCpE 442. Memory System
CpE 442 Memory System CPE 442 memory.1 Outline of Today s Lecture Recap and Introduction (5 minutes) Memory System: the BIG Picture? (15 minutes) Memory Technology: SRAM and Register File (25 minutes)
More informationVector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks
Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor
More informationPower dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.
The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults
More information