Jacek Czaja, Machine Learning Engineer, AI Product Group

Size: px
Start display at page:

Download "Jacek Czaja, Machine Learning Engineer, AI Product Group"

Transcription

1 Jacek Czaja, Machine Learning Engineer, AI Product Group

2 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright 2015, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessordependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #

3

4 Artificial Intelligence, Machine Learning & Deep Learning 4

5 Deep Learning use cases Transport: Automated Driving Health: Pneumonia detection Agriculture: Robotics Space: Lunar Craters detection Positive: [1] Negative: Consumer: Speech/text search Finance: Customer support Energy: Oil & gas search Finance: financial forecasting

6 Why Now? Bigger Data Better Hardware Smarter Algorithms Image: 1000 KB / picture Audio: 5000 KB / song Video: 5,000,000 KB / movie Transistor density doubles every 18 months Cost / GB in 1995: $ Cost / GB in 2017: $0.02 Advances in algorithm innovation, including neural networks, leading to better accuracy in training models 6

7 Sharing Companies share algorithms and topologies Their gold is: Data Trained models Talent 7

8 Loss Visual Understanding Intel Labs China Innovate in cutting-edge visual cognition & machine learning technologies for smart computing to enable novel usages and user experience Face 2D/3D Analysis Face & Emotion Recognition Engine Efficient Deep Learning DNN Design based Visual & Compression Recognition Visual Parsing & Multimodal Analysis 128 F C 128 Face Analysis Technology Multimodal Emotion Recognition Efficient CNN Algorithm Design DNN Model Compression Automatic Image/Video Captioning Visual Question & Answering 6 8

9 Machine Learning Types Supervised Teach desired behavior with labeled data and infer new data Unsupervised Make inferences with unlabeled data and discover patterns Semi-supervised A combination of supervised and unsupervised learning Labeled Data Labeled and Unlabeled Data Unlabeled Data Clustered Data Classified Data Reinforcement Act in a environment to maximize reward Build autonomous agents that learn Classified Data 9

10 Machine Learning Types Supervised Teach desired behavior with labeled data and infer new data Unsupervised Make inferences with unlabeled data and discover patterns Semi-supervised A combination of supervised and unsupervised learning Labeled Data Labeled and Unlabeled Data Unlabeled Data Clustered Data Classified Data Reinforcement Act in a environment to maximize reward Build autonomous agents that learn Classified Data 10

11 data Training Forward Propagation Back Propagation output expected penalty (error or cost) person cat dog bike person cat dog bike 11

12 data Inference Forward Propagation output person cat dog bike 12

13

14 AIPG Nervana Deep Learning Portfolio RESEARCH AND APPLICATION SUPPORT Intel Brain Data Scientist Team BDM & Direct Optimization Team Research new AI usages and models Develop POC with customers to apply AI methods Enable customers to deploy products DEEP LEARNING PLATFORM Nervana Deep Learning Studio Titanium: HW mgmt. Nervana Cloud Intel branded Data scientist and developer DL productivity tools DL Cloud Service for POC, developers and academics DL appliance for DLaaS ENABLING PRODUCT SOFTWARE MKL-DNN, other math libraries Frameworks Nervana Graph HW Transformers, Non-x86 libraries Frameworks for developers Back end APIs to Nervana Graph Accelerate framework optimization on IA; open source For framework developers & Intel Multi-node optimizations Extend to non-dc inference products and use cases SYSTEMS Node & rack reference designs Channel sales Deep Learning Systems Enable direct and end customers with Deep Learn System portfolio Intel branded under investigation PRODUCTS Datacenter Edge, client, gateway Comprehensive product portfolio General purpose x86 Dedicated DL NPU accelerators 14

15 AI gateway/edge All purpose Flexible acceleration ADAS LOW power vision Intel Processors agile AI Platforms Range of performance and power for widest variety of AI, gateway & edge workloads including deep learning inference Intel FPGA Enhanced DL Inference Acceleration for deep learning inference in real-time with higher efficiency, and wide range of workloads & configurations Mobileye s EyeQ-5 autonomous driving Real-time fused camera/radar inference, path planning, roadreconstruction in vehicle Movidius Myriad-X computer vision Low power computer vision engine using deep learning inference in gateway and devices *Knights Mill (KNM); select = single-precision highly-parallel workloads generally scale to >100 threads and benefit from more vectorization, and may also benefit from greater memory bandwidth e.g. energy (reverse time migration), deep learning training, etc. All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. 15

16 AI Datacenter All purpose Flexible acceleration Deep Learning Intel Xeon Scalable Processors Known Compute for AI Scalable performance for widest variety of AI & other datacenter workloads including breakthrough deep learning training & inference Intel FPGA Enhanced DL Inference Scalable acceleration for deep learning inference in real-time with higher efficiency, and wide range of workloads & configurations Intel Nervana Neural Network Processor Deep learning by design Scalable acceleration with best performance for intensive deep learning training & inference, period 16

17 Intel Xeon scalable processor Scalable performance for widest variety of AI & other datacenter workloads including deep learning Most agile AI platform Built-in ROI Begin your AI journey today using existing, familiar infrastructure Potent performance Up to 2.2X deep learning training & inference perf vs. prior gen 1 ; 113X with SW optimizations 2 Production-ready Robust support for full range of AI deployments Classic ML Deep Learning Reasoning Emerging AI Analytics More 1,2 Configuration details on slide: 18, 20, 24 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #

18 Performance Drivers for AI Workloads Compute Bandwidth SW Optimizations 18

19 GEMM performance (Measured in GFLOPS) represented relative to a baseline 1.0 Higher is Better Up to 3.4x Integer Matrix Multiply Performance on Intel Xeon Platinum 8180 Processor Matrix Multiply Performance on Intel Xeon Platinum 8180 Processor compared to Intel Xeon Processor E v Single Precision Floating Point General Matrix Multiply SGEMM (FP32) 1S Intel Xeon Processor E v4 3.4 Integer General Matrix Multiply IGEMM (INT8) 1S Intel Xeon Platinum 8180 Processor 8bit IGEMM will be available in Intel Math Kernel Library (Intel MKL) 2018 Gold to be released by end of Q Enhanced matrix multiply performance on Intel Xeon Scalable Processors Configuration Details on Slide: 24 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microproc essor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. 19

20 AI Performance Gen over Gen INFERENCE THROUGHPUT TRAINING THROUGHPUT Up to 2.4x Intel Xeon Platinum 8180 Processor higher Neon ResNet 18 inference throughput compared to Intel Xeon Processor E v4 Up to 2.2x Intel Xeon Platinum 8180 Processor higher Neon ResNet 18 training throughput compared to Intel Xeon Processor E v4 Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher. Advance previous generation AI workload performance with Intel Xeon Scalable Processors Inference throughput batch size: 1 Training throughput batch size: 256 Configuration Details on Slide: 18, 20 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. 20

21 AI Performance Software + Hardware INFERENCE THROUGHPUT TRAINING THROUGHPUT Up to 138x Up to 113x Optimized Frameworks Intel Xeon Platinum 8180 Processor higher Intel optimized Caffe GoogleNet v1 with Intel MKL inference throughput compared to Intel Xeon Processor E v3 with BVLC-Caffe Intel Xeon Platinum 8180 Processor higher Intel Optimized Caffe AlexNet with Intel MKL training throughput compared to Intel Xeon Processor E v3 with BVLC-Caffe Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher. Optimized Intel MKL Libraries Deliver significant AI performance with hardware and software optimizations on Intel Xeon Scalable Processors INFERENCE using FP32 Batch Size Caffe GoogleNet v1 256 AlexNet 256 Configuration Details on Slide: 18, 25 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. 21

22 Inference Throughput shown in Images/Second Up to 2.4x Higher Inference Throughput on Intel Xeon Platinum 8180 Processor 2S Intel Xeon Processor E5-2699v4, 22C, 2.2GHz 2S Intel Xeon Platinum 8180 Processor, 28C, 2.5GHz AlexNet BS = GoogLeNet v1 BS = ResNet-50 BS = 1024 VGG-19 BS = AlexNet ConvNet BS = GoogLeNet ConvNet BS = VGG ConvNet BS = AlexNet BS = VGG-19 BS = 256 Inception V3 BS = 1024 ResNet-50 BS = AlexNet ConvNet BS = GoogLeNet v1 ConvNet BS = 1024 Caffe TensorFlow MXNet Neon Inference throughput measured with FP32 instructions. Inference with INT8 will be higher. Additional optimizations may further improve performance ResNet 18 BS = 1024 Intel Xeon Platinum Processor delivers Inference throughput performance across different frameworks Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. 22

23 Time to Train (hours) Intel Xeon Scalable Processor Multi-node Performance (1 node) ResNet-50 Time to train (Hours) - Weak scaling minibatch 64 (2 nodes) (4 nodes) (8 nodes) (16 nodes) (32 nodes) (64 nodes) 3.9 SKX SKX-8180* (128 nodes) (256 nodes) (352 nodes) (470 nodes) (704 nodes) Global minibatch - scaled across nodes MB-32 per node MB-24 per node MB-16 per node Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations an d functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of August

24 November

25 Crest family 2017 Deep learning By design Unprecedented compute density Large reduction in time-to-train 32 GB of in package memory via HBM2 technology 8 Tera-bits/s of memory access speed Scalable acceleration with best performance for intensive deep learning training & inference, period Custom hardware Blazing data access High-speed scalability 12 bi-directional high-bandwidth links Seamless data transfer via interconnects

26 ICL ICL ICL ICL ICL ICL ICL ICL ICL ICL ICL ICL Intel Nervana Lake Crest NPU Architecture Interposer ICC HBM2 HBM PHY Mem Ctrlr Processing Cluster Processing Cluster Processing Cluster Mem Ctrlr HBM PHY HBM2 Processing Cluster Processing Cluster Processing Cluster SPI, IC2, GPIO MGMT CPU Processing Cluster Processing Cluster Processing Cluster HBM2 HBM PHY Mem Ctrlr Processing Cluster Processing Cluster Processing Cluster Mem Ctrlr HBM PHY HBM2 PCIe Controller & DMA PCI Express x16 ICC Floorplan not to scale 26

27 FlexPoint Numerical Format Designed Float16 Flex DEC=8 DEC=7 DEC=8 DEC=6 DEC=7 DEC=8 DEC=7 DEC=6 DEC=8 MANTISSA EXPONENT 11 bit mantissa precision (-1024 to 1023) Individual 5-bit exponents DEC=8 MANTISSA EXPONENT 16 bit mantissa 45% more precision than Float16 (-32,768 to 32,767) Tensor-wide shared 5-bit exponent Flex16 accuracy on par with Float32 but with much smaller cores 27

28

29 Diversity in Deep Networks VVariety in Network Topology Recurrent NNs common for NLP/ASR, DAG for GoogLeNet, Networks with memory Recurrent NN BBut there are a few well defined building blocks Convolutions common for image recognition tasks GEMMs for recurrent network layers could be sparse ReLU, tanh, softmax CNN - AlexNet GoogLeNet 29

30 Naïve Convolution 30

31 Cache Friendly Convolution arxiv.org/pdf/ v1.pdf 31

32 Performance Optimization on Modern Platforms Hierarchical Parallelism Coarse-Grained / multi-node Domain decomposition Fine-Grained Parallelism / within node Sub-domain: 1) Multi-level domain decomposition (ex. across layers) 2) Data decomposition (layer parallelism) Scaling Improve load balancing Reduce synchronization events, all-to-all comms Utilize all the cores OpenMP, MPI, TBB Reduce synchronization events, serial code Improve load balancing Vectorize/SIMD Unit strided access per SIMD lane High vector efficiency Data alignment Efficient memory/cache use Blocking Data reuse Prefetching Memory allocation

33 Intel MKL and Intel MKL-DNN for Deep Learning Intel Math Kernel Library (Intel MKL) Deep Learning Frameworks Intel MKL-DNN Xeon Xeon Phi FPGA Intel MKL DNN primitives + wide variety of other math functions C DNN APIs (C++ future) Binary distribution Free community license. Premium support available as part of Parallel Studio XE Broad usage DNN primitives; not specific to individual frameworks Quarterly update releases Intel MKL-DNN DNN primitives C/C++ DNN APIs Open source DNN code* Apache 2.0 license Multiple variants of DNN primitives as required for framework integrations Rapid development ahead of Intel MKL releases * GEMM matrix multiply building blocks are binary

34

35 Intel Nervana Deep Learning Studio Compress Innovation Cycle to Accelerate Time-to-Solution What it is A comprehensive software suite to allow groups of data scientists to reduce the innovation cycle and enable them to develop custom, enterprise-grade deep learning solutions in record time. Available as part of Intel Nervana Cloud and Intel Nervana Deep Learning System. Images Video Text Speech Tabular Time series Why it's important It is both time consuming and expensive to develop a deep learning solution due to expensive data scientists spending too much time wrangling data and manually executing hundreds of experiments to find the right network topology and combination of parameters to achieve a converged model that fits their use case. Intel Nervana Deep Learning Studio Deep Learning Frameworks Neon (more coming soon) Intel Nervana Hardware Learn More: intelnervana.com Users Primary: Data scientists Secondary: Software developers who take trained deep learning models and integrate into their applications. 35

36 High-Level Workflow Data Scientist ncloud Command Line Interface Multiple Interface options Interactive Notebooks User Interface Label Dataset Import Dataset Build Model Train Deploy Cloud/Server Model Library Trained Model Edge 36

37

38 Intel Nervana ai academy Intel Developer Zone for Artificial Intelligence Deep Learning Frameworks, libraries and additional tools Workshops, Webinars, Meet Ups & Remote Access software.intel.com/ai/academy Intelnervana.com

39 39

40 [1] CheXNet: Radiologist-Level Pneumonia Detection on Chest X- Rays with Deep Learning

41

Enabling the future of Artificial intelligence

Enabling the future of Artificial intelligence Enabling the future of Artificial intelligence Contents AI Overview Intel Nervana AI products Hardware Software Intel Nervana Deep Learning Platform Learn more - Intel Nervana AI Academy Artificial Intelligence,

More information

April 2 nd, Bob Burroughs Director, HPC Solution Sales

April 2 nd, Bob Burroughs Director, HPC Solution Sales April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software

More information

Real World Development examples of systems / iot

Real World Development examples of systems / iot Real World Development examples of systems / iot Intel Software Developer Conference Seoul 2017 Jon Kim Software Consulting Engineer Contents IOT end-to-end Scalability with Intel x86 Architect Real World

More information

Vectorization Advisor: getting started

Vectorization Advisor: getting started Vectorization Advisor: getting started Before you analyze Run GUI or Command Line Set-up environment Linux: source /advixe-vars.sh Windows: \advixe-vars.bat Run GUI or Command

More information

Achieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017

Achieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017 Achieving Peak Performance on Intel Hardware Intel Software Developer Conference London, 2017 Welcome Aims for the day You understand some of the critical features of Intel processors and other hardware

More information

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Python Landscape Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#1:

More information

Intel Math Kernel Library (Intel MKL) Team - Presenter: Murat Efe Guney Workshop on Batched, Reproducible, and Reduced Precision BLAS Georgia Tech,

Intel Math Kernel Library (Intel MKL) Team - Presenter: Murat Efe Guney Workshop on Batched, Reproducible, and Reduced Precision BLAS Georgia Tech, Intel Math Kernel Library (Intel MKL) Team - Presenter: Murat Efe Guney Workshop on Batched, Reproducible, and Reduced Precision BLAS Georgia Tech, Atlanta February 24, 2017 Acknowledgements Benoit Jacob

More information

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor

More information

Deep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor

Deep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor WELCOME TO Operant conditioning 1938 Spiking Neuron 1952 first neuroscience department 1964 Deep learning prevalence mid 2000s The Turing Machine 1936 Transistor 1947 First computer science department

More information

Accelerate Machine Learning on macos with Intel Integrated Graphics. Hisham Chowdhury May 23, 2018

Accelerate Machine Learning on macos with Intel Integrated Graphics. Hisham Chowdhury May 23, 2018 Accelerate Machine Learning on macos with Intel Integrated Graphics Hisham Chowdhury May 23, 2018 Apple Machine Learning Stack Machine Learning Application 1 Machine Learning Application 2 Vision Natural

More information

IXPUG 16. Dmitry Durnov, Intel MPI team

IXPUG 16. Dmitry Durnov, Intel MPI team IXPUG 16 Dmitry Durnov, Intel MPI team Agenda - Intel MPI 2017 Beta U1 product availability - New features overview - Competitive results - Useful links - Q/A 2 Intel MPI 2017 Beta U1 is available! Key

More information

Data center: The center of possibility

Data center: The center of possibility Data center: The center of possibility Diane bryant Executive vice president & general manager Data center group, intel corporation Data center: The center of possibility The future is Thousands of Clouds

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

WITH INTEL TECHNOLOGIES

WITH INTEL TECHNOLOGIES WITH INTEL TECHNOLOGIES Commitment Is to Enable The Best Democratize technologies Advance solutions Unleash innovations Intel Xeon Scalable Processor Family Delivers Ideal Enterprise Solutions NEW Intel

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

H.J. Lu, Sunil K Pandey. Intel. November, 2018

H.J. Lu, Sunil K Pandey. Intel. November, 2018 H.J. Lu, Sunil K Pandey Intel November, 2018 Issues with Run-time Library on IA Memory, string and math functions in today s glibc are optimized for today s Intel processors: AVX/AVX2/AVX512 FMA It takes

More information

What s P. Thierry

What s P. Thierry What s new@intel P. Thierry Principal Engineer, Intel Corp philippe.thierry@intel.com CPU trend Memory update Software Characterization in 30 mn 10 000 feet view CPU : Range of few TF/s and

More information

Achieving 2.5X 1 Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization

Achieving 2.5X 1 Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization white paper Internet Discovery Artificial Intelligence (AI) Achieving.X Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization As one of the world s preeminent

More information

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides

More information

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017 Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference London, 2017 Agenda Vectorization is becoming more and more important What is

More information

Sayantan Sur, Intel. ExaComm Workshop held in conjunction with ISC 2018

Sayantan Sur, Intel. ExaComm Workshop held in conjunction with ISC 2018 Sayantan Sur, Intel ExaComm Workshop held in conjunction with ISC 2018 Legal Disclaimer & Optimization Notice Software and workloads used in performance tests may have been optimized for performance only

More information

Bei Wang, Dmitry Prohorov and Carlos Rosales

Bei Wang, Dmitry Prohorov and Carlos Rosales Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512

More information

Fast forward. To your <next>

Fast forward. To your <next> Fast forward To your Navin Shenoy EXECUTIVE VICE PRESIDENT GENERAL MANAGER, DATA CENTER GROUP CLOUD ECONOMICS INTELLIGENT DATA PRACTICES NETWORK TRANSFORMATION Intel Xeon Scalable Platform The

More information

What s New August 2015

What s New August 2015 What s New August 2015 Significant New Features New Directory Structure OpenMP* 4.1 Extensions C11 Standard Support More C++14 Standard Support Fortran 2008 Submodules and IMPURE ELEMENTAL Further C Interoperability

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel

More information

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP

More information

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend

More information

HPC Advisory COUNCIL

HPC Advisory COUNCIL Inside AI HPC Advisory COUNCIL Lugano 2017 Scalable Systems for Distributed Deep Learning Benchmarking, Performance Optimization and Architectures Gaurav kaul SYSTEMS ARCHITECT, INTEL DATACENTRE GROUP

More information

Getting Started with Intel SDK for OpenCL Applications

Getting Started with Intel SDK for OpenCL Applications Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel

More information

Memory & Thread Debugger

Memory & Thread Debugger Memory & Thread Debugger Here is What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action Intel Confidential 2 Analysis Tools for Diagnosis

More information

High Performance Computing The Essential Tool for a Knowledge Economy

High Performance Computing The Essential Tool for a Knowledge Economy High Performance Computing The Essential Tool for a Knowledge Economy Rajeeb Hazra Vice President & General Manager Technical Computing Group Datacenter & Connected Systems Group July 22 nd 2013 1 What

More information

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS

More information

Expressing and Analyzing Dependencies in your C++ Application

Expressing and Analyzing Dependencies in your C++ Application Expressing and Analyzing Dependencies in your C++ Application Pablo Reble, Software Engineer Developer Products Division Software and Services Group, Intel Agenda TBB and Flow Graph extensions Composable

More information

Graphics Performance Analyzer for Android

Graphics Performance Analyzer for Android Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent

More information

Sarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018

Sarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018 Sarah Knepper Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018 Outline Motivation Problem statement and solutions Simple example Performance comparison 2 Motivation Partial differential equations

More information

Intel Software Development Products Licensing & Programs Channel EMEA

Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Advanced Performance Distributed Performance Intel Software Development Products Foundation of

More information

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel Clang * : An Excellent C++ Compiler LLVM * : Collection of modular and reusable compiler and toolchain technologies Created by Chris Lattner

More information

Achieving Peak Performance on Intel Hardware. Jim Cownie: Intel Software Developer Conference Frankfurt, December 2017

Achieving Peak Performance on Intel Hardware. Jim Cownie: Intel Software Developer Conference Frankfurt, December 2017 Achieving Peak Performance on Intel Hardware Jim Cownie: Intel Software Developer Conference Frankfurt, December 2017 Welcome Aims for the day You understand some of the critical features of Intel processors

More information

Accelerating Data Center Workloads with FPGAs

Accelerating Data Center Workloads with FPGAs Accelerating Data Center Workloads with FPGAs Enno Lübbers NorCAS 2017, Linköping, Sweden Intel technologies features and benefits depend on system configuration and may require enabled hardware, software

More information

Jackson Marusarz Software Technical Consulting Engineer

Jackson Marusarz Software Technical Consulting Engineer Jackson Marusarz Software Technical Consulting Engineer What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action 2 Analysis Tools for Diagnosis

More information

Data-Centric Innovation Summit NAVEEN RAO CORPORATE VICE PRESIDENT & GENERAL MANAGER ARTIFICIAL INTELLIGENCE PRODUCTS GROUP

Data-Centric Innovation Summit NAVEEN RAO CORPORATE VICE PRESIDENT & GENERAL MANAGER ARTIFICIAL INTELLIGENCE PRODUCTS GROUP Data-Centric Innovation Summit NAVEEN RAO CORPORATE VICE PRESIDENT & GENERAL MANAGER ARTIFICIAL INTELLIGENCE PRODUCTS GROUP Data center logic silicon Tam ~30% cagr Ai is exploding $8-10B Emerging as a

More information

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017 Becca Paren Cluster Systems Engineer Software and Services Group May 2017 Clusters are complex systems! Challenge is to reduce this complexity barrier for: Cluster architects System administrators Application

More information

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF 1 Outline KNL results Our other work related to HPCG 2 ~47 GF/s per KNL ~10

More information

Kevin O Leary, Intel Technical Consulting Engineer

Kevin O Leary, Intel Technical Consulting Engineer Kevin O Leary, Intel Technical Consulting Engineer Moore s Law Is Going Strong Hardware performance continues to grow exponentially We think we can continue Moore's Law for at least another 10 years."

More information

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth Presenter: Surabhi Jain Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth May 25, 2018 ROME workshop (in conjunction with IPDPS 2018), Vancouver,

More information

Accelerate Deep Learning Inference with openvino toolkit

Accelerate Deep Learning Inference with openvino toolkit Accelerate Deep Learning Inference with openvino toolkit Priyanka Bagade, IoT Developer Evangelist, Intel Core and Visual Computing Group Optimization Notice Intel s compilers may or may not optimize to

More information

INTEL MKL Vectorized Compact routines

INTEL MKL Vectorized Compact routines INTEL MKL Vectorized Compact routines Mesut Meterelliyoz, Peter Caday, Timothy B. Costa, Kazushige Goto, Louise Huot, Sarah Knepper, Arthur Araujo Mitrano, Shane Story 2018 BLIS RETREAT 09/17/2018 OUTLINE

More information

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013 Achieving High Performance Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013 Does Instruction Set Matter? We find that ARM and x86 processors are simply engineering design points optimized

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information

More performance options

More performance options More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel

More information

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Agenda Intel Advisor for vectorization

More information

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

MICHAL MROZEK ZBIGNIEW ZDANOWICZ MICHAL MROZEK ZBIGNIEW ZDANOWICZ Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY

More information

Intel Cluster Checker 3.0 webinar

Intel Cluster Checker 3.0 webinar Intel Cluster Checker 3.0 webinar June 3, 2015 Christopher Heller Technical Consulting Engineer Q2, 2015 1 Introduction Intel Cluster Checker 3.0 is a systems tool for Linux high performance compute clusters

More information

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media

More information

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization

More information

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory Jongsoo Park, Parallel Computing Lab, Intel Corporation with contributions from MKL team 1 Algorithm/

More information

LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015

LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015 LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015 Abstract Library for small matrix-matrix multiplications targeting

More information

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation

Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation Agenda 1D interpolation problem statement Computation flow Application areas Data fitting in Intel MKL Data

More information

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...

More information

Kirill Rogozhin. Intel

Kirill Rogozhin. Intel Kirill Rogozhin Intel From Old HPC principle to modern performance model Old HPC principles: 1. Balance principle (e.g. Kung 1986) hw and software parameters altogether 2. Compute Density, intensity, machine

More information

Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov

Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov What is the Parallel STL? C++17 C++ Next An extension of the C++ Standard Template Library algorithms with the execution policy argument Support for parallel

More information

Intel Advisor XE. Vectorization Optimization. Optimization Notice

Intel Advisor XE. Vectorization Optimization. Optimization Notice Intel Advisor XE Vectorization Optimization 1 Performance is a Proven Game Changer It is driving disruptive change in multiple industries Protecting buildings from extreme events Sophisticated mechanics

More information

Sayantan Sur, Intel. SEA Symposium on Overlapping Computation and Communication. April 4 th, 2018

Sayantan Sur, Intel. SEA Symposium on Overlapping Computation and Communication. April 4 th, 2018 Sayantan Sur, Intel SEA Symposium on Overlapping Computation and Communication April 4 th, 2018 Legal Disclaimer & Benchmark results were obtained prior to implementation of recent software patches and

More information

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor * Some names and brands may be claimed as the property of others. Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor E.J. Bylaska 1, M. Jacquelin

More information

Crosstalk between VMs. Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA

Crosstalk between VMs. Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA Crosstalk between VMs Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA 2 September 2015 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT

More information

Data and Intelligence in Storage Carol Wilder Intel Corporation

Data and Intelligence in Storage Carol Wilder Intel Corporation Data and Intelligence in Storage Carol Wilder carol.a.wilder@intel.com Intel Corporation 1 Legal Notices/Disclaimer Intel technologies features and benefits depend on system configuration and may require

More information

Case Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing

Case Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing Case Study Software Optimizing an Illegal Image Filter System Intel Integrated Performance Primitives High-Performance Computing Tencent Doubles the Speed of its Illegal Image Filter System using SIMD

More information

Jim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen

Jim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen Jim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen Features We Discuss Synchronization (lock) hints The nonmonotonic:dynamic schedule Both Were new in OpenMP 4.5 May have slipped

More information

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document

More information

FAST FORWARD TO YOUR <NEXT> CREATION

FAST FORWARD TO YOUR <NEXT> CREATION FAST FORWARD TO YOUR CREATION THE ULTIMATE PROFESSIONAL WORKSTATIONS POWERED BY INTEL XEON PROCESSORS 7 SEPTEMBER 2017 WHAT S NEW INTRODUCING THE NEW INTEL XEON SCALABLE PROCESSOR BREAKTHROUGH PERFORMANCE

More information

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Motivation C AVX2 AVX512 New instructions utilized! Scalar performance

More information

Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation

Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation Alexander Kalinkin Anton Anders Roman Anders 1 Legal Disclaimer INFORMATION IN

More information

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document

More information

Intel s Architecture for NFV

Intel s Architecture for NFV Intel s Architecture for NFV Evolution from specialized technology to mainstream programming Net Futures 2015 Network applications Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин

Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS

More information

Growth in Cores - A well rehearsed story

Growth in Cores - A well rehearsed story Intel CPUs Growth in Cores - A well rehearsed story 2 1. Multicore is just a fad! Copyright 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

More information

Optimizing Film, Media with OpenCL & Intel Quick Sync Video

Optimizing Film, Media with OpenCL & Intel Quick Sync Video Optimizing Film, Media with OpenCL & Intel Quick Sync Video Petter Larsson, Senior Software Engineer Ryan Tabrah, Product Manager The Intel Vision Enriching the lives of every person on earth through technology

More information

A U G U S T 8, S A N T A C L A R A, C A

A U G U S T 8, S A N T A C L A R A, C A A U G U S T 8, 2 0 1 8 S A N T A C L A R A, C A Data-Centric Innovation Summit LISA SPELMAN VICE PRESIDENT & GENERAL MANAGER INTEL XEON PRODUCTS AND DATA CENTER MARKETING Increased integration and optimization

More information

Werner Schueler. Enterprise Account Manager, Intel

Werner Schueler. Enterprise Account Manager, Intel Werner Schueler Enterprise Account Manager, Intel The next big wave mainframes Standardsbased servers Cloud computing Data deluge COMPUTE breakthrough Innovation surge Artificial intelligence 12X AI Compute

More information

unleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer

unleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer the future unleashed Alexey Belogortsev Field Application Engineer Intel Xeon Scalable Processors for High Performance Computing Growing Challenges in System Architecture The Walls System Bottlenecks Divergent

More information

Sergey Maidanov. Software Engineering Manager for Intel Distribution for Python*

Sergey Maidanov. Software Engineering Manager for Intel Distribution for Python* Sergey Maidanov Software Engineering Manager for Intel Distribution for Python* Introduction Python is among the most popular programming languages Especially for prototyping But very limited use in production

More information

Research Faculty Summit Systems Fueling future disruptions

Research Faculty Summit Systems Fueling future disruptions Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel

More information

Using Intel VTune Amplifier XE for High Performance Computing

Using Intel VTune Amplifier XE for High Performance Computing Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message

More information

Tuning Python Applications Can Dramatically Increase Performance

Tuning Python Applications Can Dramatically Increase Performance Tuning Python Applications Can Dramatically Increase Performance Vasilij Litvinov Software Engineer, Intel Legal Disclaimer & 2 INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED,

More information

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US

More information

HPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel

HPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel 15.03.2012 1 Accelerating HPC HPC Advisory Council Lugano, CH March 15 th, 2012 Herbert Cornelius Intel Legal Disclaimer 15.03.2012 2 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Fastest and most used math library for Intel -based systems 1

Fastest and most used math library for Intel -based systems 1 Fastest and most used math library for Intel -based systems 1 Speaker: Alexander Kalinkin Contributing authors: Peter Caday, Kazushige Goto, Louise Huot, Sarah Knepper, Mesut Meterelliyoz, Arthur Araujo

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

Andreas Schneider. Markus Leberecht. Senior Cloud Solution Architect, Intel Deutschland. Distribution Sales Manager, Intel Deutschland

Andreas Schneider. Markus Leberecht. Senior Cloud Solution Architect, Intel Deutschland. Distribution Sales Manager, Intel Deutschland Markus Leberecht Senior Cloud Solution Architect, Intel Deutschland Andreas Schneider Distribution Sales Manager, Intel Deutschland Legal Disclaimers 2016 Intel Corporation. Intel, the Intel logo, Xeon

More information

Fast Hardware For AI

Fast Hardware For AI Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights

More information

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60

More information

A Simple Path to Parallelism with Intel Cilk Plus

A Simple Path to Parallelism with Intel Cilk Plus Introduction This introductory tutorial describes how to use Intel Cilk Plus to simplify making taking advantage of vectorization and threading parallelism in your code. It provides a brief description

More information

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and

More information

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most

More information

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Bitonic Sorting Intel OpenCL SDK Sample Documentation Intel OpenCL SDK Sample Documentation Document Number: 325262-002US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL

More information

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos

More information