Inference Engine compiler and SDK FWDNXT 2018

Size: px

Start display at page:

Download "Inference Engine compiler and SDK FWDNXT 2018"

Maximillian Quinn
5 years ago
Views:

1 Inference Engine compiler and SDK 1

2 Deep Learning processor Best performance per power Best utilization Efficient use of memory bandwidth Low latency Scalability: IoT to cloud 2

3 Deep Learning processor direct from framework trained model to hardware no FPGA programming required ultrashort time to working accelerated application 3

4 Inference Engine Modular Deep Learning system PCI cards: 1 to 48 FPGAs Plug and play Modules can be used standalone / IoT, edge 4

Conv (3x3 s1) Conv (3x3 s1) Conv (3x3 s1) FCN (4096) 2.

5 Deep Learning workflow Input Image (3*224*224) Steps: Conv (11x11 s4) Maxpool (3x3 s2) Conv (5x5 s1) Maxpool (3x3 s2) 1. Select neural network model we can help! Conv (3x3 s1) Conv (3x3 s1) Conv (3x3 s1) FCN (4096) 2. Train in your favorite Deep learning Framework FCN (4096) FC (1000) 3. Obtain trained model and save as ONNX interchange neural training network format (32 bit) network frameworks description 5

6 Snowflake Deep Learning workflow Steps: 1. Input trained model to our compiler 2. The compiler converts it into Inference Engine machine code so you do not have to code anything! 3. change a few lines of your Compiler application to target the Inference Engine! 4. done! compiler Inference deployment Engine 6

7 Supported neural networks any frameworks AlexNet ResNet GoogleNet LinkNet encoderdecoder RNN LSTM Input Image Input Image Conv (7x7 s2) Input Image Conv (11x11 Maxpool (3x3 Conv (7x7 s2) Maxpool (3x3 Conv (1x1 s1) Maxpool (3x3 Conv (5x5 s1) Conv (3x3 s1) 3x Bottleneck Maxpool (3x3 Maxpool (3x3 4x Bottleneck Conv (3x3 s1) 2x Inception 6x Bottleneck Conv (3x3 s1) Maxpool (3x3 3x Bottleneck Conv (3x3 s1) 5x Inception Avg Pool (7x7 Maxpool (3x3 FCN (4096) training FC (1000) 2x Inception FCN (4096) Avg Pool (7x7 FC (1000) ONNX FC (1000) 7

8 Deep Learning infrastructure deployment Load distribution at the application level Application tells compiler how to optimize: 1 compiler distributes networks on multiple Inference Engines 2 compiler distributes multiple models to multiple Inference Engines 8

9 9 import instantiate run 1 import inference_engine sample 4 import sys 5 import PIL 6 from PIL import Image 7 import numpy as np deployment 8 9 from argparse import ArgumentParser 10 # argument Checking 11 parser = ArgumentParser(description="FWDNXT Demonstration") code args = parser.parse_args() #Load image into a numpy array 22 img = Image.open(args.image) 24 #Resize it to the size expected by the network 25 img = img.resize((args.res[2], args.res[1]), resample=pil.image.bilinear) 27 #Convert to numpy float 28 img = np.array(img).astype(np.float32) / #Transpose to planemajor, as required by our API 31 img = np.ascontiguousarray(img.transpose(2,0,1)) #Normalize images 34 stat_mean = list([0.485, 0.456, 0.406]) 35 stat_std = list([0.229, 0.224, 0.225]) 36 for i in range(3): 37 img[i] = (img[i] stat_mean[i])/stat_std[i] #Create and initialize the snowfloke object 40 fie1 = inference_engine.instance() 41 nresults = sf.init("{:d}x{:d}x{:d}".format(args.res[1], args.res[2], args.res[0]), args.modelpath, '') #Create the storage for the result and run one inference 47 result = np.ndarray(nresults,dtype=np.float32) 48 fie1.run(img, result) #Convert to numpy and print top5 51 idxs = (result).argsort() 54 print(' Inference Engine results ') 55for i in range(5): 62 print(idxs[i], result[idxs[i]])

10 SDK for embedded Systems Embedded code: Compiler prepares binaries on a separate computer C library interface upload FPGA bitfile to/from FPGA memory control Snowflake No need for Xilinx FPGA tools on device no FPGA programming required 10

11 Available Inference Engine systems Inference Engine k511 1k852 FPGA Micron AC510 Micron AC511 Micron SB852 Accelerator cores Clock Freq. 187 MHz 250 MHz 250 MHz Peak Throughput 191 Gops/s 512 Gops/s 512 Gops/s Memory 4 GB HMC 2 GB HMC 512 GB DDR4 2 GB HMC Memory B/W 60 GB/s 120 GB/s 120 GB/s Power system 24 W 48 W 150 W (MAX) 11

12 FWDNXT Inference Engine and SDK summary Compiler input: trained neural network in any framework Compiler: Using Micron FPGA from net to hardware to Inference Engine accelerate machine code neural network no HDL needed! Example application provided: fast implementation 12

13 Thank you 13

Xilinx ML Suite Overview

Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame