HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov

Size: px
Start display at page:

Download "HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov"

Transcription

1 HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov

2 Deep learning ecosystem today Software Hardware 2

3 HPE s portfolio for deep learning Government, academia and industries Financial services Government and academia Life Sciences, Health Autonomous vehicles / Mfg. Advisory, professional and operational services, HPE Flexible Capacity, HPE Datacenter Care for Hyperscale HPE SGI 8600 Petaflop scale for deep learning and HPC Compute ideal for training models in data center HPE Apollo 6500 The enterprise bridge to accelerated computing HPE Apollo sx40 Maximize GPU capacity and performance with lower TCO Compute for both training models and inference at edge HPE Apollo 2000 The bridge to enterprise scale-out architecture Edge analytics and inference engine HPE Edgeline EL4000 Unprecedented deep edge compute and high capacity storage; open standards AI Software Framework HPC Storage Choice of Fabrics Easy Setup and Flexible OS Using Bright Computing s distribution of deep learning software development components and workload management tool integration HPE Apollo 4520 HPC Data Management Framework Software Large-scale, storage virtualization & tiered data management platform Arista Networking Intel Omni-Path Architecture Mellanox InfiniBand HPE FlexFabric Network 3

4 How to pick the right hardware/software stack? 4

5 One size does NOT fit all Application Data type Model (topology of artificial neural network): How many layers How many neurons per layer Connections between neurons (types of layers) Data size 5

6 Popular models Name Type Model size (# params) Model size (MB) GFLOPs (forward pass) AlexNet CNN 60,965, MB 0.7 GoogleNet CNN 6,998, MB 1.6 VGG-16 CNN 138,357, MB 15.5 VGG-19 CNN 143,667, MB 19.6 ResNet50 CNN 25,610, MB 3.9 ResNet101 CNN 44,654, MB 7.6 ResNet152 CNN 60,344, MB 11.3 Eng Acoustic Model RNN 34,678, MB TextCNN CNN 151, MB

7 Popular models Name Type Model size (# params) Model size (MB) GFLOPs (forward pass) AlexNet CNN 60,965, MB 0.7 GoogleNet CNN 6,998, MB 1.6 VGG-16 CNN 138,357, MB 15.5 VGG-19 CNN 143,667, MB 19.6 ResNet50 CNN 25,610, MB 3.9 ResNet101 CNN 44,654, MB 7.6 ResNet152 CNN 60,344, MB 11.3 Eng Acoustic Model RNN 34,678, MB TextCNN CNN 151, MB

8 Distributed training with data parallelism Mini-batch size IO bandwidth & latency Pre-processing on a fly Fetch data Compute gradients Replica batch size Forward/backward computational complexity Computational power of a node Aggregate model updates Model size Number of workers/nodes Interconnect bandwidth & latency 8

9 HPE Deep Learning Cookbook Overview Comprehensive set of tools to guide the choice of the best hardware and software environment for different deep learning workloads. Eliminate the guesswork with Deep Learning Performance Guide: tap into a massive pool of performance data to reason on optimal hardware configuration for your workload Validate the configuration of your hardware and software environment with Deep Learning Benchmarking Suite Get started fast with one of our Reference Designs as a default technology recipe 9

10 HPE Deep Learning Cookbook Main components HPE Deep Learning Benchmarking Suite HPE Deep Learning Performance Guide Reference Designs Automated benchmarking tool to collect performance of different deep learning workloads on various hardware and software configurations. A web-based tool to guide a choice of optimal hardware and software configuration via analysis of collected performance data and applying performance models. Reference hardware/software stacks for particular classes of deep learning workloads. Available on GitHub To be released in 2018 Will be hosted by HPE Image Classification Reference Designs released 10

11 HPE Deep Learning Benchmarking Suite 11

12 HPE Deep Learning Benchmarking Suite A tool to benchmark various DL frameworks and models. To be open sourced in November frameworks, 1 inference runtime TensorFlow, BVLC/NVIDIA/Intel Caffe, Caffe2, MXNet, PyTorch, TensorRT 18 models (AlexNet, GoogleNet, ResNets, VGGs ) Host / docker benchmarks Docker files for all supported frameworks / SW combinations Resource monitoring: GPU/CPU/memory utilization Report builders: Weak/strong scaling Benchmark statistics and charts More information: GitHub Documentation 12

13 Benchmarking Suite Architecture Mediators between experimenter and frameworks Runs inference and training Which benchmarks to run Default parameters TensorFlow Caffe TF CNN TensorFlow Caffe specification Experimenter Caffe2 Caffe2 Caffe2 TensorRT TensorRT TensorRT Configures benchmarks, runs one at a time MXNet MXNet MXNet Caffe2 PyTorch PyTorch GitHub: Documentation: Standard or custom frameworks (Docker and/or host) 13

14 How to expand My CNTK TF My CNTK TF workload CNTK Default parameters TensorFlow Caffe TF CNN TensorFlow Caffe specification Experimenter Caffe2 Caffe2 Caffe2 TensorRT TensorRT TensorRT MXNet MXNet MXNet Caffe2 PyTorch PyTorch GitHub: Documentation: 14

15 Quick Start 15

16 HPE Deep Learning Performance Guide 16

17 17

18 Demo 18

19 Performance Models (TensorFlow) Performance and scalability models for deep learning workloads in different hardware and software environments. Computation models are based on machine learning (Support Vector Regression) Communication models are based on analytical models 3000 Batch Size GoogleNet ResNet101 ResNet152 ResNet50 VGG16 VGG19 Actual Predicted 19

20 Reference Designs 20

21 Reference Designs A reference hardware/software stack for a particular class of workloads Image classification Natural Language Processing Speech Recognition Video Analytics Image Classification Reference Design: 21

22 Thank you Natalia Vassilieva Sergey Serebryakov HPE Deep Learning Cookbook 22

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance

More information

Predicting Service Outage Using Machine Learning Techniques. HPE Innovation Center

Predicting Service Outage Using Machine Learning Techniques. HPE Innovation Center Predicting Service Outage Using Machine Learning Techniques HPE Innovation Center HPE Innovation Center - Our AI Expertise Sense Learn Comprehend Act Computer Vision Machine Learning Natural Language Processing

More information

Unlock business value with HPC & Artificial Intelligence. FORUM TERATEC June 19, 2018 José RODRIGUES HPC Sales Manager

Unlock business value with HPC & Artificial Intelligence. FORUM TERATEC June 19, 2018 José RODRIGUES HPC Sales Manager Unlock business value with HPC & Artificial Intelligence FORUM TERATEC June 19, 2018 José RODRIGUES HPC Sales Manager Investment in HPC delivers compelling financial returns Financial Services Oil and

More information

An introduction to Machine Learning silicon

An introduction to Machine Learning silicon An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional

More information

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

Cisco UCS C480 ML M5 Rack Server Performance Characterization

Cisco UCS C480 ML M5 Rack Server Performance Characterization White Paper Cisco UCS C480 ML M5 Rack Server Performance Characterization The Cisco UCS C480 ML M5 Rack Server platform is designed for artificial intelligence and machine-learning workloads. 2018 Cisco

More information

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

NVIDIA FOR DEEP LEARNING. Bill Veenhuis NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA

More information

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most

More information

In partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER

In partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER In partnership with VelocityAI REFERENCE JULY // 2018 Contents Introduction 01 Challenges with Existing AI/ML/DL Solutions 01 Accelerate AI/ML/DL Workloads with Vexata VelocityAI 02 VelocityAI Reference

More information

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / 2017. 10. 31 syoh@add.re.kr Page 1/36 Overview 1. Introduction 2. Data Generation Synthesis 3. Distributed Deep Learning 4. Conclusions

More information

IBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads

IBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads IBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads The engine to power your AI data pipeline Introduction: Artificial intelligence (AI) including deep learning (DL) and machine

More information

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017 DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

MoonRiver: Deep Neural Network in C++

MoonRiver: Deep Neural Network in C++ MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement

More information

Scalable Distributed Training with Parameter Hub: a whirlwind tour

Scalable Distributed Training with Parameter Hub: a whirlwind tour Scalable Distributed Training with Parameter Hub: a whirlwind tour TVM Stack Optimization High-Level Differentiable IR Tensor Expression IR AutoTVM LLVM, CUDA, Metal VTA AutoVTA Edge FPGA Cloud FPGA ASIC

More information

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS S8497 - INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS Chris Lamb CUDA and NGC Engineering, NVIDIA John Barco NGC Product Management, NVIDIA NVIDIA GPU Cloud (NGC) overview AGENDA Using NGC

More information

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

Xilinx ML Suite Overview

Xilinx ML Suite Overview Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame

More information

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

High-Performance Training for Deep Learning and Computer Vision HPC

High-Performance Training for Deep Learning and Computer Vision HPC High-Performance Training for Deep Learning and Computer Vision HPC Panel at CVPR-ECV 18 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

AIRI SCALE-OUT AI-READY INFRASTRUCTURE ARCHITECTED BY PURE STORAGE AND NVIDIA WITH ARISTA 7060X SWITCH REFERENCE ARCHITECTURE

AIRI SCALE-OUT AI-READY INFRASTRUCTURE ARCHITECTED BY PURE STORAGE AND NVIDIA WITH ARISTA 7060X SWITCH REFERENCE ARCHITECTURE REFERENCE ARCHITECTURE AIRI SCALE-OUT AI-READY INFRASTRUCTURE ARCHITECTED BY PURE STORAGE AND NVIDIA WITH ARISTA 7060X SWITCH TABLE OF CONTENTS INTRODUCTION... 3 Accelerating Computation: NVIDIA DGX-1...

More information

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to

More information

Deep Learning Performance and Cost Evaluation

Deep Learning Performance and Cost Evaluation Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction

More information

CafeGPI. Single-Sided Communication for Scalable Deep Learning

CafeGPI. Single-Sided Communication for Scalable Deep Learning CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks

More information

NVIDIA DEEP LEARNING INSTITUTE

NVIDIA DEEP LEARNING INSTITUTE NVIDIA DEEP LEARNING INSTITUTE TRAINING CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial

More information

A NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA

A NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA A NEW COMPUTING ERA Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA THE ERA OF AI AI CLOUD MOBILE PC 2 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X

More information

IBM Deep Learning Solutions

IBM Deep Learning Solutions IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle

More information

Deep Learning Performance and Cost Evaluation

Deep Learning Performance and Cost Evaluation Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,

More information

Deep Learning mit PowerAI - Ein Überblick

Deep Learning mit PowerAI - Ein Überblick Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s

More information

Deep Learning with Intel DAAL

Deep Learning with Intel DAAL Deep Learning with Intel DAAL on Knights Landing Processor David Ojika dave.n.ojika@cern.ch March 22, 2017 Outline Introduction and Motivation Intel Knights Landing Processor Intel Data Analytics and Acceleration

More information

NVIDIA PLATFORM FOR AI

NVIDIA PLATFORM FOR AI NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Paving the Path to Exascale November 2017 Mellanox Accelerates Leading HPC and AI Systems Summit CORAL System Sierra CORAL System Fastest Supercomputer in Japan Fastest Supercomputer

More information

Using CNN Across Intel Architecture

Using CNN Across Intel Architecture white paper Artificial Intelligence Object Classification Intel AI Builders Object Classification Using CNN Across Intel Architecture Table of Contents Abstract...1 1. Introduction...1 2. Setting up a

More information

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

Fuzzy Set Theory in Computer Vision: Example 3

Fuzzy Set Theory in Computer Vision: Example 3 Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures

More information

High Performance Computing

High Performance Computing High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason

More information

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded

More information

Introduction to Deep Learning in Signal Processing & Communications with MATLAB

Introduction to Deep Learning in Signal Processing & Communications with MATLAB Introduction to Deep Learning in Signal Processing & Communications with MATLAB Dr. Amod Anandkumar Pallavi Kar Application Engineering Group, Mathworks India 2019 The MathWorks, Inc. 1 Different Types

More information

Deep Learning Inference on Openshift with GPUs

Deep Learning Inference on Openshift with GPUs Deep Learning Inference on Openshift with GPUs OpenShift Commons, Seattle, Dec 10 2018 Tripti Singhal Product Manager, NVIDIA Deep Learning Software Tushar Katarki Product Manager, AI on OpenShift AGENDA

More information

HPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab

HPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab HPC and AI Solution Overview Garima Kochhar HPC and AI Innovation Lab 1 Dell EMC HPC and DL team charter Design, develop and integrate HPC and DL Heading systems Lorem ipsum dolor sit amet, consectetur

More information

1 Overview Definitions (read this section carefully) 2

1 Overview Definitions (read this section carefully) 2 MLPerf User Guide Version 0.5 May 2nd, 2018 1 Overview 2 1.1 Definitions (read this section carefully) 2 2 General rules 3 2.1 Strive to be fair 3 2.2 System and framework must be consistent 4 2.3 System

More information

Data center: The center of possibility

Data center: The center of possibility Data center: The center of possibility Diane bryant Executive vice president & general manager Data center group, intel corporation Data center: The center of possibility The future is Thousands of Clouds

More information

Deep Learning Accelerators

Deep Learning Accelerators Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction

More information

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

Snapdragon NPE Overview

Snapdragon NPE Overview March 2018 Linaro Connect Hong Kong Snapdragon NPE Overview Mark Charlebois Director, Engineering Qualcomm Technologies, Inc. Caffe2 Snapdragon Neural Processing Engine Efficient execution on Snapdragon

More information

Enabling the future of Artificial intelligence

Enabling the future of Artificial intelligence Enabling the future of Artificial intelligence Contents AI Overview Intel Nervana AI products Hardware Software Intel Nervana Deep Learning Platform Learn more - Intel Nervana AI Academy Artificial Intelligence,

More information

HP Update. Bill Mannel VP/GM HPC & Big Data Business Unit Apollo Servers

HP Update. Bill Mannel VP/GM HPC & Big Data Business Unit Apollo Servers Update Bill Mannel VP/GM C & Big Data Business Unit Apollo Servers The most exciting shifts of our time are underway Cloud Security Mobility Time to revenue is critical Big Data Decisions must be rapid

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training

More information

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,

More information

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Accelerating GPU inferencing with DirectML and DirectX 12 Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Machine Learning Machine learning has become immensely popular over the last decade

More information

Embedded GPGPU and Deep Learning for Industrial Market

Embedded GPGPU and Deep Learning for Industrial Market Embedded GPGPU and Deep Learning for Industrial Market Author: Dan Mor GPGPU and HPEC Product Line Manager September 2018 Table of Contents 1. INTRODUCTION... 3 2. DIFFICULTIES IN CURRENT EMBEDDED INDUSTRIAL

More information

Machine learning for the Internet of Things

Machine learning for the Internet of Things Machine learning for the Internet of Things Chris Shore Director of Embedded Solutions Arm 2018 Arm Limited April 2018 More Intelligence at the Edge Arm Cortex-M Expanding opportunity for the embedded

More information

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,

More information

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018 Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks

More information

GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB

GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB Ram Kokku 2018 The MathWorks, Inc. 1 GPUs and CUDA programming faster Performance CUDA OpenCL C/C++ GPU Coder MATLAB Python Ease of programming

More information

Using MRAM to Create Intelligent SSDs

Using MRAM to Create Intelligent SSDs Using MRAM to Create Intelligent SSDs Jérôme Gaysse Senior Technology&Market Analyst jerome.gaysse@silinnov-consulting.com Santa Clara, CA 1 Study context Analysis of system & application Performance modeling

More information

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TERATECH Juin 2017 Gunter Roth, François Courteille DRAMATIC

More information

AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management

AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management @SC Asia 2018 Rangan Sukumar, PhD Office of the CTO, Cray Inc. Safe Harbor Statement This presentation

More information

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Panda Department of Computer Science and Engineering The Ohio

More information

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Talk at Bench 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~luxi

More information

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating

More information

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and

More information

HOW TO BUILD A MODERN AI

HOW TO BUILD A MODERN AI HOW TO BUILD A MODERN AI FOR THE UNKNOWN IN MODERN DATA 1 2016 PURE STORAGE INC. 2 Official Languages Act (1969/1988) 3 Translation Bureau 4 5 DAWN OF 4 TH INDUSTRIAL REVOLUTION BIG DATA, AI DRIVING CHANGE

More information

Voice, Image, Video : AI in action with AWS. 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Voice, Image, Video : AI in action with AWS. 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Voice, Image, Video : AI in action with AWS A long heritage of machine learning at Amazon Personalized recommendations Fulfillment automation and inventory management Drones Voice driven interactions Inventing

More information

IBM Power AC922 Server

IBM Power AC922 Server IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated

More information

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn

More information

Deep Learning on AWS with TensorFlow and Apache MXNet

Deep Learning on AWS with TensorFlow and Apache MXNet Deep Learning on AWS with TensorFlow and Apache MXNet Julien Simon Global Evangelist, AI & Machine Learning @julsimon Renaud ALLIOUX CTO, Earthcube The Amazon ML Stack: Broadest & Deepest Set of Capabilities

More information

Data-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP

Data-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP Data-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP Devices / edge network Cloud/data center Removing data Bottlenecks with Fpga acceleration

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence

More information

April 2 nd, Bob Burroughs Director, HPC Solution Sales

April 2 nd, Bob Burroughs Director, HPC Solution Sales April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software

More information

GPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13

GPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13 GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Machine Learning. Bridging the OT IT Gap for Machine Learning with Ignition and AWS Greengrass

Machine Learning. Bridging the OT IT Gap for Machine Learning with Ignition and AWS Greengrass Machine Learning with Ignition and AWS Greengrass Bridging B the OT IT Gap for Machine Learning Simply Connect Ignition & AWS Greengrass for Machine Learning! Bridging the OT IT gaps is now easier using

More information

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER Markus Weber and Haiduong Vo NVIDIA DGX SYSTEMS Agenda NVIDIA DGX-1 NVIDIA DGX STATION 2 ONE YEAR LATER NVIDIA DGX-1 Barriers Toppled, the Unsolvable

More information

Deep Learning Frameworks with Spark and GPUs

Deep Learning Frameworks with Spark and GPUs Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,

More information

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications Jongsoo Park Facebook AI System SW/HW Co-design Team Sep-21 2018 Team Introduction

More information

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks Layer-wise Performance Bottleneck Analysis of Deep Neural Networks Hengyu Zhao, Colin Weinshenker*, Mohamed Ibrahim*, Adwait Jog*, Jishen Zhao University of California, Santa Cruz, *The College of William

More information

DGX UPDATE. Customer Presentation Deck May 8, 2017

DGX UPDATE. Customer Presentation Deck May 8, 2017 DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated

More information

Onto Petaflops with Kubernetes

Onto Petaflops with Kubernetes Onto Petaflops with Kubernetes Vishnu Kannan Google Inc. vishh@google.com Key Takeaways Kubernetes can manage hardware accelerators at Scale Kubernetes provides a playground for ML ML journey with Kubernetes

More information

Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances

Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances Adrien Gaidon - Machine Learning Lead, Toyota Research Institute Mike Garrison - Senior Systems Engineer,

More information

Cloud Computing with FPGA-based NVMe SSDs

Cloud Computing with FPGA-based NVMe SSDs Cloud Computing with FPGA-based NVMe SSDs Bharadwaj Pudipeddi, CTO NVXL Santa Clara, CA 1 Choice of NVMe Controllers ASIC NVMe: Fully off-loaded, consistent performance, M.2 or U.2 form factor ASIC OpenChannel:

More information

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)

More information

Demystifying Deep Learning

Demystifying Deep Learning Demystifying Deep Learning Let the computers do the hard work Jérémy Huard 2015 The MathWorks, Inc. 1 2 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source

More information

Accelerating Data Center Workloads with FPGAs

Accelerating Data Center Workloads with FPGAs Accelerating Data Center Workloads with FPGAs Enno Lübbers NorCAS 2017, Linköping, Sweden Intel technologies features and benefits depend on system configuration and may require enabled hardware, software

More information

Fast forward. To your <next>

Fast forward. To your <next> Fast forward To your Navin Shenoy EXECUTIVE VICE PRESIDENT GENERAL MANAGER, DATA CENTER GROUP CLOUD ECONOMICS INTELLIGENT DATA PRACTICES NETWORK TRANSFORMATION Intel Xeon Scalable Platform The

More information

Profiling DNN Workloads on a Volta-based DGX-1 System

Profiling DNN Workloads on a Volta-based DGX-1 System Profiling DNN Workloads on a Volta-based DGX-1 System Saiful A. Mojumder 1, Marcia S Louis 1, Yifan Sun 2, Amir Kavyan Ziabari 3, José L. Abellán 4, John Kim 5, David Kaeli 2, Ajay Joshi 1 1 ECE Department,

More information

Towards Scalable Machine Learning

Towards Scalable Machine Learning Towards Scalable Machine Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Fraunhofer Center Machnine Larning Outline I Introduction

More information

InfiniBand-based HPC Clusters

InfiniBand-based HPC Clusters Boosting Scalability of InfiniBand-based HPC Clusters Asaf Wachtel, Senior Product Manager 2010 Voltaire Inc. InfiniBand-based HPC Clusters Scalability Challenges Cluster TCO Scalability Hardware costs

More information

Deploying Deep Learning Networks to Embedded GPUs and CPUs

Deploying Deep Learning Networks to Embedded GPUs and CPUs Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train

More information

Wu Zhiwen.

Wu Zhiwen. Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms

More information

Jacek Czaja, Machine Learning Engineer, AI Product Group

Jacek Czaja, Machine Learning Engineer, AI Product Group Jacek Czaja, Machine Learning Engineer, AI Product Group Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,

More information

Brainchip OCTOBER

Brainchip OCTOBER Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History

More information

SUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016

SUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016 SUPERCHARGE DEEP LEARNING WITH DGX-1 Markus Weber SC16 - November 2016 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering

More information

Demystifying Deep Learning

Demystifying Deep Learning Demystifying Deep Learning Mandar Gujrathi Mandar.Gujrathi@mathworks.com.au 2015 The MathWorks, Inc. 1 2 Deep Learning Applications Voice assistants (speech to text) Teaching character to beat video game

More information