Enabling the future of Artificial intelligence

Size: px

Start display at page:

Download "Enabling the future of Artificial intelligence"

Deborah Moody
5 years ago
Views:

1 Enabling the future of Artificial intelligence

2 Contents AI Overview Intel Nervana AI products Hardware Software Intel Nervana Deep Learning Platform Learn more - Intel Nervana AI Academy

3 Artificial Intelligence, Machine Learning & Deep Learning 3

4 Why Now? Bigger Data Better Hardware Smarter Algorithms Image: 1000 KB / picture Audio: 5000 KB / song Video: 5,000,000 KB / movie Transistor density doubles every 18 months Cost / GB in 1995: $ Cost / GB in 2017: $0.02 Advances in algorithm innovation, including neural networks, leading to better accuracy in training models 4

juanmerodio.com/en/wp-content/uploads/gold-data.

5 Sharing Companies share algorithms and topologies Their gold is: Data Trained models Talent 5

6 Machine Learning Types Supervised Teach desired behavior with labeled data and infer new data Unsupervised Make inferences with unlabeled data and discover patterns Semi-supervised A combination of supervised and unsupervised learning Labeled Data Labeled and Unlabeled Data Unlabeled Data Clustered Data Classified Data Reinforcement Act in a environment to maximize reward Build autonomous agents that learn Classified Data 6

7 Machine Learning Types Supervised Teach desired behavior with labeled data and infer new data Unsupervised Make inferences with unlabeled data and discover patterns Semi-supervised A combination of supervised and unsupervised learning Labeled Data Labeled and Unlabeled Data Unlabeled Data Clustered Data Classified Data Reinforcement Act in a environment to maximize reward Build autonomous agents that learn Classified Data 7

8 data Training Forward Propagation Back Propagation output expected penalty (error or cost) person cat dog bike person cat dog bike 8

9 data Inference Forward Propagation output person cat dog bike 9

10 Deep Learning use cases Healthcare: Tumor detection Agriculture: Robotics Energy: Oil & gas search Consumer: Speech/text search Positive: Positive: Negative: Negative: Transport: Automated Driving Finance: Time-series search Proteomics: Sequence analysis Consumer: Smart speakers Query: Results: 10

INtel Nervana Deep Learning Portfolio RESEARCH AND APPLICATION SUPPORT Intel

and models Develop POC with customers to apply AI methods Enable customers to

DL Cloud Service for POC, developers and academics DL appliance for DLaaS

Graph HW Transformers, Non-x86 libraries Frameworks for developers Back end

framework developers & Intel Multi-node optimizations Extend to non-dc

sales Deep Learning Systems Enable direct and end customers with Deep Learn

12 INtel Nervana Deep Learning Portfolio RESEARCH AND APPLICATION SUPPORT Intel Brain Data Scientist Team BDM & Direct Optimization Team Research new AI usages and models Develop POC with customers to apply AI methods Enable customers to deploy products DEEP LEARNING PLATFORM Nervana Deep Learning Studio Titanium: HW mgmt. Nervana Cloud Intel branded Data scientist and developer DL productivity tools DL Cloud Service for POC, developers and academics DL appliance for DLaaS ENABLING PRODUCT SOFTWARE MKL-DNN, other math libraries Frameworks Nervana Graph HW Transformers, Non-x86 libraries Frameworks for developers Back end APIs to Nervana Graph Accelerate framework optimization on IA; open source For framework developers & Intel Multi-node optimizations Extend to non-dc inference products and use cases SYSTEMS Node & rack reference designs Channel sales Deep Learning Systems Enable direct and end customers with Deep Learn System portfolio Intel branded under investigation PRODUCTS Datacenter Edge, client, gateway Comprehensive product portfolio General purpose x86 Dedicated DL NPU accelerators 12

AI Datacenter All purpose Highly-parallel Flexible acceleration Deep Learning Intel Xeon

datacenter workloads including breakthrough deep learning training & inference Intel Xeon Phi

learning training and select highlyparallel datacenter workloads* Intel FPGA Enhanced DL

and wide range of workloads & configurations Crest Family Deep learning by design Scalable

*Knights Mill (KNM); select = single-precision highly-parallel workloads generally scale to >100

e.g. energy (reverse time migration), deep learning training, etc.

13 AI Datacenter All purpose Highly-parallel Flexible acceleration Deep Learning Intel Xeon Scalable Processors most agile AI Platform Scalable performance for widest variety of AI & other datacenter workloads including breakthrough deep learning training & inference Intel Xeon Phi Processor (Knights Mill) Faster DL Training Scalable performance optimized for even faster deep learning training and select highlyparallel datacenter workloads* Intel FPGA Enhanced DL Inference Scalable acceleration for deep learning inference in real-time with higher efficiency, and wide range of workloads & configurations Crest Family Deep learning by design Scalable acceleration with best performance for intensive deep learning training & inference, period *Knights Mill (KNM); select = single-precision highly-parallel workloads generally scale to >100 threads and benefit from more vectorization, and may also benefit from greater memory bandwidth e.g. energy (reverse time migration), deep learning training, etc. All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. 13

14 Performance Drivers for AI Workloads Compute Bandwidth SW Optimizations 14

15 Intel Xeon scalable processor Scalable performance for widest variety of AI & other datacenter workloads including deep learning Most agile AI platform Built-in ROI Begin your AI journey today using existing, familiar infrastructure Potent performance Up to 2.2X deep learning training & inference perf vs. prior gen 1 ; 113X with SW optimizations 2 Production-ready Robust support for full range of AI deployments Classic ML Deep Learning Reasoning Emerging AI Analytics More 1,2 Configuration details on slide: 18, 20, 24 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #

16 AI Performance Gen over Gen INFERENCE THROUGHPUT TRAINING THROUGHPUT Up to 2.4x Intel Xeon Platinum 8180 Processor higher Neon ResNet 18 inference throughput compared to Intel Xeon Processor E v4 Up to 2.2x Intel Xeon Platinum 8180 Processor higher Neon ResNet 18 training throughput compared to Intel Xeon Processor E v4 Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher. Advance previous generation AI workload performance with Intel Xeon Scalable Processors Inference throughput batch size: 1 Training throughput batch size: 256 Configuration Details on Slide: 18, 20 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. 16

17 GEMM performance (Measured in GFLOPS) represented relative to a baseline 1.0 Higher is Better Up to 3.4x Integer Matrix Multiply Performance on Intel Xeon Platinum 8180 Processor Matrix Multiply Performance on Intel Xeon Platinum 8180 Processor compared to Intel Xeon Processor E v Single Precision Floating Point General Matrix Multiply SGEMM (FP32) 3.4 Integer General Matrix Multiply IGEMM (INT8) 1S Intel Xeon Processor E v4 1S Intel Xeon Platinum 8180 Processor 8bit IGEMM will be available in Intel Math Kernel Library (Intel MKL) 2018 Gold to be released by end of Q Enhanced matrix multiply performance on Intel Xeon Scalable Processors Configuration Details on Slide: 24 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. 17

18 Inference Throughput shown in Images/Second Up to 2.4x Higher Inference Throughput on Intel Xeon Platinum 8180 Processor 2S Intel Xeon Processor E5-2699v4, 22C, 2.2GHz 2S Intel Xeon Platinum 8180 Processor, 28C, 2.5GHz AlexNet BS = GoogLeNet v1 BS = ResNet-50 BS = VGG-19 BS = 256 AlexNet ConvNet BS = GoogLeNet ConvNet BS = VGG ConvNet BS = AlexNet BS = VGG-19 BS = Inception V3 BS = ResNet-50 BS = AlexNet ConvNet BS = GoogLeNet v1 ConvNet BS = 1024 Caffe TensorFlow MXNet Neon ResNet 18 BS = 1024 Inference throughput measured with FP32 instructions. Inference with INT8 will be higher. Additional optimizations may further improve performance. Intel Xeon Platinum Processor delivers Inference throughput performance across different frameworks Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. 18

AI Performance Software + Hardware INFERENCE THROUGHPUT TRAINING THROUGHPUT Up to 138x Up to 113x Optimized Frameworks Intel Xeon Platinum 8180 Processor higher Intel optimized Caffe GoogleNet v1

19 AI Performance Software + Hardware INFERENCE THROUGHPUT TRAINING THROUGHPUT Up to 138x Up to 113x Optimized Frameworks Intel Xeon Platinum 8180 Processor higher Intel optimized Caffe GoogleNet v1 with Intel MKL inference throughput compared to Intel Xeon Processor E v3 with BVLC-Caffe Intel Xeon Platinum 8180 Processor higher Intel Optimized Caffe AlexNet with Intel MKL training throughput compared to Intel Xeon Processor E v3 with BVLC-Caffe Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher. Optimized Intel MKL Libraries Deliver significant AI performance with hardware and software optimizations on Intel Xeon Scalable Processors INFERENCE using FP32 Batch Size Caffe GoogleNet v1 256 AlexNet 256 Configuration Details on Slide: 18, 25 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. 19

20 Time to Train (hours) Intel Xeon Scalable Processor Multi-node Performance ResNet-50 Time to train (Hours) - Weak scaling minibatch SKX SKX-8180* (1 node) 64 (2 nodes) 128 (4 nodes) 256 (8 nodes) 512 (16 nodes) 1024 (32 nodes) 2048 (64 nodes) 4096 (128 nodes) 8192 (256 nodes) (352 nodes) (470 nodes) (704 nodes) MB-32 per node MB-24 per node MB-16 per node Global minibatch - scaled across nodes Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance 2017 tests Intel may Corporation. have been optimized All rights for performance reserved. only Intel on Intel and microprocessors. the Intel logo are Performance trademarks tests, of such Intel as SYSmark Corporation and MobileMark, or its subsidiaries are measured in the using U.S. specific and/or computer other systems, countries. components, *Other software, names operations and and functions. Any change to any of those factors may cause brands the results may to be vary. claimed You should as the consult property other information of others. and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: For more complete Source: Intel information measured as about of August compiler optimizations, see our Optimization Notice. 20

21 Performance Drivers for AI Workloads Compute Higher number of operations per second Intel Xeon Platinum 8180 Processor (1-socket) up to 3570 GFLOPS on SGEMM (FP32) up to 5185 GOPS on IGEMM (Int8) Increased parallelism and vectorization Intel Xeon Scalable Processor offers Intel AVX-512 with up to 2 512bit FMA units computing in parallel per core 1 Higher number of cores bandwidth High Throughput, Low Latency Intel Xeon Scalable Processors offer up to 6 DDR4 channels per socket and new mesh architecture Intel Xeon Processor 8180 Up to 199GB/s of STREAM Triad performance on a 2 socket system Efficient Large Sized Caches Intel Xeon Scalable Processors offer increased private local Mid-Level Cache MLC up to 1MB per core Up to 28 core Intel Xeon Scalable Processors 1 Available on Intel Xeon Platinum Processors, Intel Xeon Gold Processors and Intel Xeon 5122 Processor Configuration Details on Slide: 23 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. 21

Unprecedented compute density Large reduction in time-to-train 32 GB of in package memory via HBM2

22 Crest family 2017 Deep learning By design Scalable acceleration with best performance for intensive deep learning training & inference, period Custom hardware Blazing data access High-speed scalability Unprecedented compute density Large reduction in time-to-train 32 GB of in package memory via HBM2 technology 8 Tera-bits/s of memory access speed 12 bi-directional high-bandwidth links Seamless data transfer via interconnects

23 ICL ICL ICL ICL ICL ICL ICL ICL ICL ICL ICL ICL Intel Nervana Lake Crest NPU Architecture Interposer ICC HBM2 HBM PHY Mem Ctrlr Processing Cluster Processing Cluster Processing Cluster Mem Ctrlr HBM PHY HBM2 Processing Cluster Processing Cluster Processing Cluster SPI, IC2, GPIO MGMT CPU Processing Cluster Processing Cluster Processing Cluster HBM2 HBM PHY Mem Ctrlr Processing Cluster Processing Cluster Processing Cluster Mem Ctrlr HBM PHY HBM2 PCIe Controller & DMA PCI Express x16 ICC Floorplan not to scale 23

24 FlexPoint Numerical Format Designed Float16 Flex DEC=8 DEC=7 DEC=8 DEC=6 DEC=7 DEC=8 DEC=7 DEC=6 DEC=8 MANTISSA EXPONENT 11 bit mantissa precision (-1024 to 1023) Individual 5-bit exponents DEC=8 MANTISSA EXPONENT 16 bit mantissa 45% more precision than Float16 (-32,768 to 32,767) Tensor-wide shared 5-bit exponent Flex16 accuracy on par with Float32 but with much smaller cores 24

Diversity in Deep Networks Variety in Network Topology

Networks with memory Recurrent NN CNN - AlexNet

blocks Convolutions common for image recognition tasks

26 Diversity in Deep Networks Variety in Network Topology Recurrent NNs common for NLP/ASR, DAG for GoogLeNet, Networks with memory Recurrent NN CNN - AlexNet GoogLeNet But there are a few well defined building blocks Convolutions common for image recognition tasks GEMMs for recurrent network layers could be sparse ReLU, tanh, softmax 26

27 Intel Math Kernel Library (Intel MKL) Optimized AVX-2 and AVX-512 instructions Intel Xeon processors and Intel Xeon Phi processors Optimized for common deep learning operations GEMM (useful in RNNs and fully connected layers) Convolutions Pooling ReLU Batch normalization Coming soon: Winograd-based convolutions 27

28 Naïve Convolution 28

29 Cache Friendly Convolution arxiv.org/pdf/ v1.pdf 29

30 Intel MKL and Intel MKL-DNN for Deep Learning Deep Learning Frameworks Intel MKL DNN primitives + wide variety of other math functions C DNN APIs (C++ future) Binary distribution Intel MKL-DNN DNN primitives C/C++ DNN APIs Open source DNN code* Intel Math Kernel Library (Intel MKL) Intel MKL-DNN Free community license. Premium support available as part of Parallel Studio XE Broad usage DNN primitives; not specific to individual frameworks Apache 2.0 license Multiple variants of DNN primitives as required for framework integrations Xeon Xeon Phi FPGA Quarterly update releases Rapid development ahead of Intel MKL releases * GEMM matrix multiply building blocks are binary

31 Deep learning software: a many to many problem Users Frameworks Engineering effort combinatorial explosion that will only worsen as hardware X software X topologies X quantization schemes expands Hardware Platforms

Nervana Graph - the project Four components: An intermediate representation (IR) for deep learning Data flow with common tensor computational primitives and scheduling/side effecting memory

32 Nervana Graph - the project Four components: An intermediate representation (IR) for deep learning Data flow with common tensor computational primitives and scheduling/side effecting memory management Compiler backends for Nervana Graph IR Nervana GPU kernels (cudnn soon), MKL-DNN, Lake Crest,... Connectors to other deep learning frameworks Currently planning TensorFlow, Caffe2, MXNet, and Pytorch support

33 Performance Optimization on Modern Platforms Hierarchical Parallelism Coarse-Grained / multi-node Domain decomposition Fine-Grained Parallelism / within node Sub-domain: 1) Multi-level domain decomposition (ex. across layers) 2) Data decomposition (layer parallelism) Scaling Improve load balancing Reduce synchronization events, all-to-all comms Utilize all the cores OpenMP, MPI, TBB Reduce synchronization events, serial code Improve load balancing Vectorize/SIMD Unit strided access per SIMD lane High vector efficiency Data alignment Efficient memory/cache use Blocking Data reuse Prefetching Memory allocation

Intel Nervana Deep Learning Studio Compress Innovation Cycle to Accelerate Time-to-Solution What it is A comprehensive software suite to allow groups of data scientists to reduce the innovation cycle

35 Intel Nervana Deep Learning Studio Compress Innovation Cycle to Accelerate Time-to-Solution What it is A comprehensive software suite to allow groups of data scientists to reduce the innovation cycle and enable them to develop custom, enterprise-grade deep learning solutions in record time. Available as part of Intel Nervana Cloud and Intel Nervana Deep Learning System. Images Video Text Speech Tabular Why it's important It is both time consuming and expensive to develop a deep learning solution due to expensive data scientists spending too much time wrangling data and manually executing hundreds of experiments to find the right network topology and combination of parameters to achieve a converged model that fits their use case. Intel Nervana Deep Learning Studio Deep Learning Frameworks Neon (more coming soon) Intel Nervana Hardware Users Primary: Data scientists Secondary: Software developers who take trained deep learning models and integrate into their applications. Time series Learn More: intelnervana.com 35

36 High-Level Workflow ncloud Command Line Interface Data Scientist Multiple Interface options Interactive Notebooks User Interface Label Dataset Import Dataset Build Model Train Deploy Cloud/Server Model Library Trained Model Edge 36

38 Intel Nervana ai academy Intel Developer Zone for Artificial Intelligence Deep Learning Frameworks, libraries and additional tools Workshops, Webinars, Meet Ups & Remote Access software.intel.com/ai/academy Intelnervana.com

Loss Visual Understanding Research @ Intel Labs China Innovate in

smart computing to enable novel usages and user experience Face

Learning DNN Design based Visual & Compression Recognition Visual

Multimodal Emotion Recognition Efficient CNN Algorithm Design DNN

39 Loss Visual Understanding Intel Labs China Innovate in cutting-edge visual cognition & machine learning technologies for smart computing to enable novel usages and user experience Face 2D/3D Analysis Face & Emotion Recognition Engine Efficient Deep Learning DNN Design based Visual & Compression Recognition Visual Parsing & Multimodal Analysis 128 F C 128 Face Analysis Technology Multimodal Emotion Recognition Efficient CNN Algorithm Design DNN Model Compression Automatic Image/Video Captioning Visual Question & Answering 6 39

40 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright 2015, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #

Jacek Czaja, Machine Learning Engineer, AI Product Group

Jacek Czaja, Machine Learning Engineer, AI Product Group Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,