HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov

Size: px

Start display at page:

Download "HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov"

Rodney Conley
5 years ago
Views:

1 HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov

2 Deep learning ecosystem today Software Hardware 2

HPE s portfolio for deep learning Government, academia and industries Financial

Advisory, professional and operational services, HPE Flexible Capacity, HPE

HPC Compute ideal for training models in data center HPE Apollo 6500 The enterprise

performance with lower TCO Compute for both training models and inference at edge

inference engine HPE Edgeline EL4000 Unprecedented deep edge compute and high

Fabrics Easy Setup and Flexible OS Using Bright Computing s distribution of deep

HPE Apollo 4520 HPC Data Management Framework Software Large-scale, storage

3 HPE s portfolio for deep learning Government, academia and industries Financial services Government and academia Life Sciences, Health Autonomous vehicles / Mfg. Advisory, professional and operational services, HPE Flexible Capacity, HPE Datacenter Care for Hyperscale HPE SGI 8600 Petaflop scale for deep learning and HPC Compute ideal for training models in data center HPE Apollo 6500 The enterprise bridge to accelerated computing HPE Apollo sx40 Maximize GPU capacity and performance with lower TCO Compute for both training models and inference at edge HPE Apollo 2000 The bridge to enterprise scale-out architecture Edge analytics and inference engine HPE Edgeline EL4000 Unprecedented deep edge compute and high capacity storage; open standards AI Software Framework HPC Storage Choice of Fabrics Easy Setup and Flexible OS Using Bright Computing s distribution of deep learning software development components and workload management tool integration HPE Apollo 4520 HPC Data Management Framework Software Large-scale, storage virtualization & tiered data management platform Arista Networking Intel Omni-Path Architecture Mellanox InfiniBand HPE FlexFabric Network 3

4 How to pick the right hardware/software stack? 4

5 One size does NOT fit all Application Data type Model (topology of artificial neural network): How many layers How many neurons per layer Connections between neurons (types of layers) Data size 5

6 Popular models Name Type Model size (# params) Model size (MB) GFLOPs (forward pass) AlexNet CNN 60,965, MB 0.7 GoogleNet CNN 6,998, MB 1.6 VGG-16 CNN 138,357, MB 15.5 VGG-19 CNN 143,667, MB 19.6 ResNet50 CNN 25,610, MB 3.9 ResNet101 CNN 44,654, MB 7.6 ResNet152 CNN 60,344, MB 11.3 Eng Acoustic Model RNN 34,678, MB TextCNN CNN 151, MB

7 Popular models Name Type Model size (# params) Model size (MB) GFLOPs (forward pass) AlexNet CNN 60,965, MB 0.7 GoogleNet CNN 6,998, MB 1.6 VGG-16 CNN 138,357, MB 15.5 VGG-19 CNN 143,667, MB 19.6 ResNet50 CNN 25,610, MB 3.9 ResNet101 CNN 44,654, MB 7.6 ResNet152 CNN 60,344, MB 11.3 Eng Acoustic Model RNN 34,678, MB TextCNN CNN 151, MB

8 Distributed training with data parallelism Mini-batch size IO bandwidth & latency Pre-processing on a fly Fetch data Compute gradients Replica batch size Forward/backward computational complexity Computational power of a node Aggregate model updates Model size Number of workers/nodes Interconnect bandwidth & latency 8

HPE Deep Learning Cookbook Overview Comprehensive set of tools to guide the choice of the best hardware and software environment for different deep learning workloads.

9 HPE Deep Learning Cookbook Overview Comprehensive set of tools to guide the choice of the best hardware and software environment for different deep learning workloads. Eliminate the guesswork with Deep Learning Performance Guide: tap into a massive pool of performance data to reason on optimal hardware configuration for your workload Validate the configuration of your hardware and software environment with Deep Learning Benchmarking Suite Get started fast with one of our Reference Designs as a default technology recipe 9

10 HPE Deep Learning Cookbook Main components HPE Deep Learning Benchmarking Suite HPE Deep Learning Performance Guide Reference Designs Automated benchmarking tool to collect performance of different deep learning workloads on various hardware and software configurations. A web-based tool to guide a choice of optimal hardware and software configuration via analysis of collected performance data and applying performance models. Reference hardware/software stacks for particular classes of deep learning workloads. Available on GitHub To be released in 2018 Will be hosted by HPE Image Classification Reference Designs released 10

11 HPE Deep Learning Benchmarking Suite 11

12 HPE Deep Learning Benchmarking Suite A tool to benchmark various DL frameworks and models. To be open sourced in November frameworks, 1 inference runtime TensorFlow, BVLC/NVIDIA/Intel Caffe, Caffe2, MXNet, PyTorch, TensorRT 18 models (AlexNet, GoogleNet, ResNets, VGGs ) Host / docker benchmarks Docker files for all supported frameworks / SW combinations Resource monitoring: GPU/CPU/memory utilization Report builders: Weak/strong scaling Benchmark statistics and charts More information: GitHub Documentation 12

13 Benchmarking Suite Architecture Mediators between experimenter and frameworks Runs inference and training Which benchmarks to run Default parameters TensorFlow Caffe TF CNN TensorFlow Caffe specification Experimenter Caffe2 Caffe2 Caffe2 TensorRT TensorRT TensorRT Configures benchmarks, runs one at a time MXNet MXNet MXNet Caffe2 PyTorch PyTorch GitHub: Documentation: Standard or custom frameworks (Docker and/or host) 13

14 How to expand My CNTK TF My CNTK TF workload CNTK Default parameters TensorFlow Caffe TF CNN TensorFlow Caffe specification Experimenter Caffe2 Caffe2 Caffe2 TensorRT TensorRT TensorRT MXNet MXNet MXNet Caffe2 PyTorch PyTorch GitHub: Documentation: 14

15 Quick Start 15

16 HPE Deep Learning Performance Guide 16

17 17

18 Demo 18

Performance Models (TensorFlow) Performance and scalability models for deep learning workloads in different hardware and software environments.

19 Performance Models (TensorFlow) Performance and scalability models for deep learning workloads in different hardware and software environments. Computation models are based on machine learning (Support Vector Regression) Communication models are based on analytical models 3000 Batch Size GoogleNet ResNet101 ResNet152 ResNet50 VGG16 VGG19 Actual Predicted 19

20 Reference Designs 20

21 Reference Designs A reference hardware/software stack for a particular class of workloads Image classification Natural Language Processing Speech Recognition Video Analytics Image Classification Reference Design: 21

22 Thank you Natalia Vassilieva Sergey Serebryakov HPE Deep Learning Cookbook 22

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance