Inference

Size: px

Start display at page:

Download "Inference"

Chrystal Cain
5 years ago
Views:

1 Inference Graham Schelle, PhD Principal Engineer Xilinx Research Labs

2 Xilinx Headlines!2 Twitch Chooses Xilinx to Enable its Broadcast-quality Livestream of esports

3 Agenda Xilinx Adaptive Architectures Inference Architectures Open Source

4 Xilinx Adaptive Architectures Traditionally, FPGAs for massively data-parallel applications!4

5 Xilinx Adaptive Architectures Traditionally, FPGAs for massively data-parallel applications In 2011, Zynq introduced (ZU+ in 2015) ARM CPUs added for embedded applications!4

6 Xilinx Adaptive Architectures Alveo & Versal Cortex-A72s AI Engines Network On Chip In 2018, Alveo introduced Accelerator cards for data center workloads Coming in 2019, Versal Platform Adaptive compute acceleration platform (ACAP)!5

7 Inference Architectures

8 Inference Architectures Evolving Frameworks Andrej Karpathy on Twitter Increasing, Evolving Workloads New acceleration needs & algorithms ML infused in many applications Adaptable HW a key benefit!7

9 Inference Architectures Evolving Frameworks Andrej Karpathy on Twitter Increasing, Evolving Workloads New acceleration needs & algorithms ML infused in many applications Adaptable HW a key benefit Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements!7

10 Inference Architectures Evolving Workloads Increasing, Evolving Workloads New acceleration needs & algorithms ML infused in many applications Adaptable HW a key benefit Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements!8

11 Inference Architectures Evolving Workloads 1. Inference is hard 2. Huge variation in compute and memory requirements 3. Models typically don t fit into cache Increasing, Evolving Workloads New acceleration needs & algorithms ML infused in many applications Adaptable HW a key benefit Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements!8

Inference Architectures Precision vs Power FPGA: Bits (W/A) Pareto Optimal 20 LSTM - Test Error vs Power(W) ASIC: 3/3 17 Test error [%] 14 2/3 3/4 2/4 2/8 4/4 11 3/8 4/8 8/8 Source: Bill Dally

12 Inference Architectures Precision vs Power FPGA: Bits (W/A) Pareto Optimal 20 LSTM - Test Error vs Power(W) ASIC: 3/3 17 Test error [%] 14 2/3 3/4 2/4 2/8 4/4 11 3/8 4/8 8/8 Source: Bill Dally (Stanford), Cadence Embedded Neural Network Summit, February 1, Estimated Power Consumption [W] Target Device ZU7EV Ambient temperature: 25 C 12.5% of toggle rate 0.5 of Static Probability Power reported for PL accelerated block only Michaela Blott, Hot Chips 2018 Tutorial, Overview of Deep Learning and Computer Architectures for Accelerating DNNs!9 Rybalkin, V., Pappalardo, A., Ghaffar, M.M., Gambardella, G., Wehn, N. and Blott, M. "FINN-L: Library Extensions and Design Tradeoff Analysis for Variable Precision LSTM Networks on FPGAs."

13 Xilinx Cloud Inference - ML Suite Overlays with xdnn Built in Programmable Logic High Utilization, Thput or Latency Variants CPU offload for new layer exploration xdnn w/ xfdnn Compiler On-prem and cloud boards

14 Xilinx Edge Inference - DeePhi Learning both Weights and Connections for Efficient Neural Networks, NeurIPS 2015 (2013) (2016) EIE: Efficient Inference Engine on Compressed Deep Neural Network, ISCA 2016 ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA, FPGA 2017!11

Neural Network, ISCA 2016 ESE: Efficient Speech

15 Xilinx Edge Inference - DeePhi Learning both Weights and Connections for Efficient Neural Networks, NeurIPS 2015 (2013) (2016) EIE: Efficient Inference Engine on Compressed Deep Neural Network, ISCA 2016 ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA, FPGA 2017!11

16 Cloud & Edge Integration!12

17 Xilinx and Open Source

18 Xilinx and Open Source PYNQ Quantized Neural Networks Xilinx Runtime for PCIe Attached FPGAs!14 More on

19 Xilinx and Open Source PYNQ Quantized Neural Networks Xilinx Runtime for PCIe Attached FPGAs!14 More on

20 Python is increasingly the Language of Choice Top Programming Languages, IEEE Spectrum, July 17 July 18 To date

21 Python is increasingly the Language of Choice Top Programming Languages, IEEE Spectrum, July 17 July 18 To date Python is listed as an embedded language for the first time

22 Python is increasingly the Language of Choice Top Programming Languages, IEEE Spectrum, July 17 July 18 To date Python is listed as an embedded language for the first time Python is the fastest growing language: driven by data science, AI, ML and academia Copyright 2018 Xilinx!15

23 PYNQ: Python Productivity for Zynq Jupyter web server IPython kernel Ubuntu-based Linux ARM A9 / A53 Overlays/designs ZU+ Fabric!16

24 PYNQ: Python Productivity for Zynq Jupyter notebooks, browser-based interface Jupyter web server IPython kernel Ubuntu-based Linux ARM A9 / A53 Overlays/designs ZU+ Fabric!16

25 PYNQ: Python Productivity for Zynq Jupyter notebooks, browser-based interface PYNQ enables JupyterLab on Zynq and ZU+ Jupyter web server IPython kernel Ubuntu-based Linux ARM A9 / A53 Overlays/designs ZU+ Fabric!16

26 PYNQ: Python Productivity for Zynq Jupyter notebooks, browser-based interface Jupyter web server IPython kernel Ubuntu-based Linux ARM A9 / A53 PYNQ enables JupyterLab on Zynq and ZU+ Overlays/designs ZU+ Fabric FPGA designs delivered as Python packages!16

27 PYNQ: Python Productivity for Zynq Jupyter notebooks, browser-based interface Jupyter web server IPython kernel Ubuntu-based Linux ARM A9 / A53 PYNQ enables JupyterLab on Zynq and ZU+ Overlays/designs ZU+ Fabric FPGA designs delivered as Python packages Delivered as SD Card image!16

28 PYNQ Community ML, Non-ML & Academic Partners!17

29 PYNQ Community ML, Non-ML & Academic Partners!17

30 PYNQ Community ML, Non-ML & Academic Partners!17

31 Xilinx open source engagements related to today s TVM meeting MicroPython!18

32 Xilinx open source engagements related to today s TVM meeting University of Washington UC San Diego Xilinx Research MicroPython UC Berkeley!18

33 Finally, Xilinx & building new open source communities Cloud Free Trials pynq.io/community DAC2019 Design Contest OpenHW Design Contest

34 Summary Xilinx Great for exploring and deploying inference Xilinx Open Source We re actively engaging with TVM and other communities Visit: Boulder, Colorado

35 Adaptable. Intelligent.

36 Edge Inference to Cloud Acceleration Inference - Edge Automotive At The Edge!22

37 Edge Inference to Cloud Acceleration Inference - Edge Automotive At The Edge ADAS/AD Central Module!22

38 Edge Inference to Cloud Acceleration Inference - Edge Automotive Surround-View Camera Back Short-Range At The Edge Radar Forward-Looking Camera Drive Monitor Camera Surround-View Camera Left Short-Range Radar Surround-View Camera Right ADAS/AD Central Module Long-Range Lidar Surround-View Camera Front Short-Range Radar!22

39 Edge to Cloud Inference Xilinx Platforms ZCU104 PYNQ-Z1 Ultra96 ZCU102 Edge Devices Custom I/O, ARM CPUs Cloud Platforms Power Efficient, PCIe, Networking!23

40 Edge to Cloud Inference IIoT Latency/Data Example Example IIoT Control Rates!24

41 Edge to Cloud Inference IIoT Latency/Data Example Example IIoT Control Rates Distance NYC to LA: 2,800 miles Speed of light: 186,000 miles/s Round trip: 2*2800/ = 30ms Required Control Rate = 10ms!24

42 Edge to Cloud Inference IIoT Latency/Data Example Example IIoT Control Rates Distance NYC to LA: 2,800 miles Speed of light: 186,000 miles/s Round trip: 2*2800/ = 30ms Required Control Rate = 10ms E.g. Power 8TB/Month!24

融入 Python 生态的 Zynq 软硬件设计框架

Python Productivity for Zynq 融入 Python 生态的 Zynq 软硬件设计框架陆佳华 Xilinx 教育与创新生态高级经理 joshual@xilinx.com Python is increasingly the Language of Choice Top Programming Languages, IEEE Spectrum, July 18 July 17