Approximate Overview of Approximate Computing

Size: px

Start display at page:

Download "Approximate Overview of Approximate Computing"

Nelson Ball
5 years ago
Views:

many colleagues from whom I stole slides: Adrian Sampson,

1 Approximate Overview of Approximate Computing Luis Ceze University of Washington PL Architecture With thanks to many colleagues from whom I stole slides: Adrian Sampson, Hadi Esmaeilzadeh, Karin Strauss, Mark Wyse, James Bornholt,

2 Moore s law gives us lots of transistors on a chip. But it is Dennard scaling that lets us use them: 2x transistor count, 40% faster, 50% more efficient 10 years Dark Silicon 45 nm 32 nm 22 nm 16 nm 11 nm 8 nm 1% 17% 36% 40% 51% 2

3 Specialization to the rescue

4 A storage gap? Is it inevitable? Disk cost-per-byte is not decreasing fast enough Information growth [Credit: David Rosenthal (CMU) and Preeti Gupta (UCSC), 2014] [Credit: EMC 2012]

simulations, games, search, machine learning Inexact

5 Modern Applications image, sound and video processing image rendering sensor data analysis, computer vision simulations, games, search, machine learning Inexact input data Approximate/iterative algorithms Malleable output

6 ASPLOS Wild and Crazy Ideas 2008

resilience to build more efficient/better

7 What is approximate computing? ~ Exploit inherent application-level resilience to build more efficient/better computers systems. Application Efficiency and performance Output accuracy Physics In essence, goal is to specialize computation, storage and communication to properties of the data and the algorithm. Enables better use of underlying substrate.

Wait, what about :) Algorithms Machine learning Iterative algorithms

Circuits Physics Reasoning about approximation in PL Approximate compiler

near/sub threshold HW ~5X Big opportunities when going non-deterministic.

8 Wait, what about :) Algorithms Machine learning Iterative algorithms Lossy compression Floating point Language Compiler ISA/Architecture Circuits Physics Reasoning about approximation in PL Approximate compiler optimizations ~2-3X Approximate execution models ~10X Non-deterministic near/sub threshold HW ~5X Big opportunities when going non-deterministic. Unsafe HW operation (timing, Vdd) Analog hardware (closer to physics) ~5X ~10-100X+

9 HW/SW co-design is essential Approximation just at the hardware level isn t safe. Approximation just at the algorithm level is suboptimal. Assuming reliable hardware for inherently robust algorithms is a waste.

10 Three important questions 1 2 What and how to approximate? How good is my output? Language Compiler Runtime 3 How to take advantage of it? Hardware

11 What and how to approximate? Language All pieces of a computation and data are not equivalent (some aspects need to be precise, others can be approximate) How to take advantage of approximation without compromising important system int a int p =...; What are the language semantics? Data-centric or code-centric?

12 How good is my output? Metric: Quality-of-Result (QoR) Application dependent, provided by programmer e.g, % of bad pixels, deviation from expected value, % of poorly classified images, car crashes, etc

13 Checking quality res = computesomething(); assert diff(res, resʹ) < 0.1; Compiler Runtime Hardware precise version of the result Check statically as much as possible But, yes, it often needs a dynamic component. Needs to be cheap!

14 How to take advantage of approximation? Compiler Runtime Precision tuning. Loop perforation. Synchronization elision. Approximate parallelization.

15 How to take advantage of approximation? Hardware Approximate functional units, data path, registers, caches, memory. CPU Approximate accelerators. CPU Acc Approximate on-chip interconnect? Mixed-mode functional units?

16 Amdahl s law... damn! Fetch Decode Reg Read Execute Memory Write Back Branch Predictor Integer FU Instruction Cache Decoder Register File Data Cache Register File ITLB FP FU DTLB Benefit limited to what can be approximated Instruction control can not be approximated

17 Neural acceleration [Esmaeilzadeh et al.] Find an approximate program component Program Compile the program and train a neural network

18 Neural acceleration [Esmaeilzadeh et al.] Find an approximate program component Program Compile the program and train a neural network Execute on a fast Neural Processing Unit (NPU) CPU NPU

Summary of NPU results application domain error metric blackscholes option pricing MSE fft DSP MSE inversek2j robotics MSE jmeint 3D-modeling miss rate jpeg compression image diff kmeans ML image

19 Summary of NPU results application domain error metric blackscholes option pricing MSE fft DSP MSE inversek2j robotics MSE jmeint 3D-modeling miss rate jpeg compression image diff kmeans ML image diff sobel vision image diff 0.9x - 24x (3.7x mean) speedup 1.5x - 51x (6.8x mean) energy red. CPU NPU F D X I M C CPU FP G A 0.8x x (3x mean) speedup 1.1x - 21x (3x mean) energy red. 1.3x - 38x (3.8x mean) speedup 0.9x - 28x (2.8x mean) energy red.

20 A taxonomy of approximation techniques (not exhaustive J) Nondeterministic Deterministic Fine Grained DRAM Refresh Rate SRAM Soft Error Exposure Approximate Storage (PCM) Synchronization Elision Voltage Overscaling Mixed-mode functional units Bit-Width Reduction Precision Scaling ALU Hierarchical FPU Float-to-Fixed Conversion Reduced-Precision FPU Underdesigned Multiplier Lossy Compression and Data Packing Load Value Approximation Coarse Grained Error Prone Processors Neural Acceleration (Analog) Code Perforation Fuzzy/Interpolated Memoization Neural Acceleration (ASIC, FPGA, GPU) Parallel Pattern Replacement

21 Approximation beyond the CPU [MICRO 13] Multi-level solid state cells Wireless Network Disk Display I/O high high low 00 low probability Fast Dense probability Compute Storage Accurate Memory

22 Code with Approx Specs + quality metric

23 10k-feet challenges Abstractions for hardware and software Specifying and guaranteeing QoR Subjective nature of quality Programmer cognitive load Composability Of hardware and software Debugging and testing correctness and performance Algorithmic transformations to enable effective approximation Avoiding Amdahl s law effect E.g., applying to data-path, or processor only is not sufficient

24 Recent efforts Tools/HCI PL Relyzer(UIUC), Debugging (UW) User perception assessment (GAtech, Cornell, UW) EnerJ (UW), Passert (MSR/UW), Rely/Chisel (MIT), Relax (Wisconsin) Uncertain<T> (MSR), Eon (UMass), FlexJava (GATech), Approx HDLs (GATech), Approx synthesis (UW), Variablity-aware software (UCSD) Compiler Runtime OS/DB Architecture Hardware Unsound transformations (MIT, UW), Synchronization Elision (IBM, UW). Green (MSR), Topaz (MIT). PowerDial (MIT), Soft error control (UCLA), SAGE & Paraprox (Michigan), Swat (UIUC), JouleGuard (Chicago), Approx parallelization (Harvard), Task-based models (MIT) BlinkDB (Berkeley/MIT), Approx Paxos (UW), Sensor Device Drivers (MIT) CMOS resilience awareness (Stanford) ANNs (UW, MSR, INRIA, Wisconsin, Qualcomm, IBM) Using Neural Nets for code approximation (GAtech/UW/MSR) Decoupled control/data plane (Minesotta) Stochastic Processors (UIUC), ERSA (Stanford), Flikker (MSR/UBC), QUORA (Purdue), Approximate Storage (MSR, UW) Probabilistic CMOS (Rice), Approximate components (Purdue), Approximate functional units (Wisc/UIUC)

26 Lots to learn from other communities DSP/Embedded systems Signals are all about approximation Machine learning Deals with quality issues inherently Numerical analysis Deterministic approximation is at its heart

27 Approximate vs Probabilistic computing Approximate: relaxing accuracy Probabilistic: computing over probabilities/distributions Orthogonal but synergistic! Reasoning about uncertainty in approximate programs Approximate evaluation of probabilistic models

28 Approximate Computing Probabilistic Programming Verifies? Verifies against model Probabilistic Program Analysis Probabilistic Model Checking

Super-relevant to exciting substrates gattaca

01011011101 Hyper-Dense 1 ZB/cm 3 (~1E8 denser

~100k-year-old DNA Eternally relevant As long as

29 Super-relevant to exciting substrates gattaca DNA synthesis DNA sequencing Hyper-Dense 1 ZB/cm 3 (~1E8 denser than Flash) Hyper-Durable We find readable ~100k-year-old DNA Eternally relevant As long as there is DNA-based intelligent life, there will be reasons to read/write DNA

30 QUESTIONS?~

31 How will approximate computing fail? Applications can t take advantage of approximation opportunities Programmers aren t able to write/debug/test approximate code Quality assurance problems Marketing reasons: buy my flaky system!

32 EnerJ/DECAF safe approximate programming [PLDI 2011, OOPSLA 15] Quality Assurance monitoring, testing, verification & debugging [PLDI 2014, ASPLOS 15] Approximate Wireless recover waste from comm errors [arxiv] Approximate ISA and uarch [ASPLOS 2012] Language Compiler ACCEPT an approximate compiler OS & Networking Neural Architecture Acceleration [MICRO 2012, Circuits ISCA 2014 HPCA 15] Approximate Storage exploiting analog properties of PCM [MICRO 2013]

Approximate Computing Is Dead; Long Live Approximate Computing. Adrian Sampson Cornell

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware Programming Quality Domains Hardware Programming No more approximate functional units. Quality Domains Narrower