Using CNN Across Intel Architecture
|
|
- Damon Bond
- 5 years ago
- Views:
Transcription
1 white paper Artificial Intelligence Object Classification Intel AI Builders Object Classification Using CNN Across Intel Architecture Table of Contents Abstract Introduction Setting up a Multinode Cluster Experiments Training Data Model Building and Network Topology Results Observations on Intel Xeon Processor Observations on Intel Xeon Phi Processor Conclusion and Future Work...9 Abstract In this work, we present the computational performance and classification accuracy for object classification using the VGG16 network on Intel Xeon processors and Intel Xeon Phi processors. The results can be used as criteria for iteration selection optimization in different experimental setups using these processors and also in multinode architecture. With an objective of evaluating accuracy for realtime logo detection from video, the results are applicable on a logo image dataset suitable for detecting the classification accuracy of the logos. 1. Introduction Deep learning (DL), which refers to a class of neural network models with deep architectures, forms an important and expressive family of machine learning (ML) models. Modern deep learning models, such as convolutional neural networks (CNNs), have achieved notable successes in a wide spectrum of machine learning tasks including speech recognition 1, visual recognition 2, and language understanding 3. The explosive success and rapid adoption of CNNs by the research community is largely attributable to high-performance computing hardware such as the Intel Xeon processor, Intel Xeon Phi processor, and graphics processing units (GPUs), as well as a wide range of easy-to-use open source frameworks including Caffe*, TensorFlow*, the cognitive toolkit (CNTK*), Torch*, and so on. 2. Setting up a Multinode Cluster The Intel Distribution for Caffe* is designed for both single node and multinode operation. There are two general approaches to parallelization (data parallelism and model parallelism), and Intel uses data parallelism. Data parallelism is when you use the same model for every thread, but feed it with different data. It means that the total batch size in a single iteration is equal to the sum of individual batch sizes of all nodes. For example, a network is trained on three nodes. All of them have a batch size of 64. The (total) batch size in a single iteration of the stochastic gradient descent algorithm is 3*64=192. Model parallelism means using the same data across all nodes, but each node is responsible for estimating different parameters. The nodes then exchange their estimates with each other to come up with the right estimate for all parameters.
2 To set up a multinode cluster, download and install the Intel Machine Learning Scaling Library (Intel MLSL) 2017 package from and source the mlslvars.sh, and then recompile the Caffe build with MLSL: = 1 in the makefile.config. When the makefile completes successfully, start the Caffe training using the message passing interface (MPI) command as follows: mpirun -n 3 -ppn 1 -machinefile ~/mpd.hosts./build/tools/caffe train \ --solver=models/bvlc_googlenet/solver_client.prototxt --engine=mkl2017 where n defines the number of nodes and ppn represents the number of processes per node. The nodes will be configured in the ~/mpd.hosts with their respective IP addresses as follows: Ansible* scripts are used to copy the binaries or files across the nodes. Clustering communication employs Intel Omni-Path Architecture (Intel OPA) 4. Validation of cluster setup is performed by using the command opainfo in all machines, and the port state must always be Active. Figure 1: Intel Omni-Path Architecture (Intel OPA) cluster information. 3. Experiments The current experiment focuses on measuring the performance of the VGG16 network on the Flickr* logo dataset, which has 32 different classes of logo. Intel Optimized Technical Preview for Multinode Caffe* is used for experiments on the single node and with Intel MLSL enabled for multinode experiments. The input images were all converted to lightning memorymapped database (LMDB) format for better efficiency. All of the experiments are set to run for 10K iterations, and the observations are noted below. We conducted our experiments in the following machine configurations. Due to lack of time we had to limit our experiments to a single execution per architecture. Intel Xeon Phi processor Model Name: Intel Xeon Phi processor Core(s) Per Socket: 68 RAM (free): 70 GB OS: CentOS* 7.3 Intel Xeon processor Model Name: Intel Xeon processor E GHz Core(s) Per Socket: 22 RAM (free): 123 GB OS: Ubuntu*
3 The multinode cluster setup is configured as follows: KNL 01 (Master) Model Name: Intel Xeon Phi processor Core(s) Per Socket: 68 RAM (free): 70 GB OS: CentOS 7.3 KNL 03 (Slave node) Model Name: Intel Xeon Phi processor Core(s) Per Socket: 68 RAM (free): 70 GB OS: CentOS 7.3 KNL 04 (Slave node) Model Name: Intel Xeon Phi processor Core(s) Per Socket: 68 RAM (free): 70 GB OS: CentOS Training Data The training and test image datasets were obtained from Datasets: FlickrLogos32 / FlickrLogos47, which is maintained by the Multimedia Computing and Computer Vision Lab, Augsburg University. There are 32 logo classes or brands in the dataset, which are downloaded from Flickr, as illustrated in the following figure: Figure 2: Flickr logo image dataset with 32 classes. The 32 classes are as follows: Adidas*, Aldi*, Apple*, Becks*, BMW*, Carlsberg*, Chimay*, Coca-Cola*, Corona*, DHL*, Erdinger*, Esso*, Fedex*, Ferrari*, Ford*, Foster's*, Google*, Guinness*, Heineken*, HP*, Milka*, Nvidia*, Paulaner*, Pepsi*, Ritter Sport*, Shell, Singha*, Starbucks*, Stella Artois*, Texaco*, Tsingtao*, and UPS*. The training set consists of 8240 images; 6000 images are no_logo images, and 70 images per class for 32 classes comprise the remaining 2240 images, thereby making the dataset highly skewed. Also, the training and test dataset is split in a ratio of 90:10 from the full 8240 samples. 3
4 3.2. Model Building and Network Topology VGG16 network topology was used for our experiments. VGG16 network topology is a 16 weights layer (13 convolutional and 3 fully connected (FC) layers) and has very small (3 x 3) convolution filters, which showed significant enhancement in network performance and detection accuracy over prior art (winning the first and second prizes in the ImageNet* challenge in 2014), and henceforth widely used as a reference topology. 4. Results 4.1 Observations on Intel Xeon Processor The Intel Xeon processors are running under the following software configurations: Caffe Version: rc3 MKL Version: _ MKL_DNN: SUPPORTED GCC Version: The following observations were noted while training for 10K iterations with a batch size of 32 and learning rate policy as POLY. Figure 3: Training loss variation with iterations (batch size 32, LR policy as POLY). Figure 4: Accuracy variation with iterations (batch size 32, LR policy as POLY). 4
5 The following observations were noted while training for 10K iterations with a batch size of 64 and learning rate policy as POLY. Figure 5: Training loss variation with iterations (batch size 64, LR policy as POLY). Figure 6: Accuracy variation with iterations (batch size 64, LR policy as POLY). The real-time training and test observations using different batch sizes for the Intel Xeon processor is depicted in the following table. The Table 2 depicts how the accuracy varies with batch size. Table 1: Real-time training results for Intel Xeon processor. Batch Size LR Policy Start Time End Time Duration Loss Accuracy at Top 1 32 POLY 18:20 23: POLY 16:20 9:57 17: STEP 16:41 6:37 13: Accuracy at Top 5 5
6 Table 2: Batch size versus accuracy details on the Intel Xeon processor. 32 Batch Size 64 Batch Size Iterations Iterations Observations on Intel Xeon Phi Processor The Intel Xeon Phi processors are running under the following software configurations: Caffe Version: rc3 MKL Version: _ MKL_DNN: SUPPORTED GCC Version: 6.2 The following observations were noted while training for 10K iterations with a batch size of 32 and learning rate policy as POLY. Figure 7: Training loss variation with iterations on Intel Xeon Phi processor (batch size 32, LR policy as POLY). 6
7 Figure 8: Accuracy variation with iterations on Intel Xeon Phi processor (batch size 32, LR policy as POLY). Figure 9: Training loss variation with iterations on Intel Xeon Phi processor (batch size 64, LR policy as POLY). Figure 10: Accuracy variation with iterations on Intel Xeon Phi processor (batch size 64, LR policy as POLY). 7
8 Figure 11: Training loss variation with iterations on Intel Xeon Phi processor (batch size 128, LR policy as POLY). Figure 12: Accuracy variation with iterations on Intel Xeon Phi processor (batch size 128, LR policy as POLY). Table 3: Batch size versus accuracy details on the Intel Xeon processor. 32 Batch Size 64 Batch Size Iterations Iterations
9 128 Batch Size Iterations Table 4: Real-time training results for the Intel Xeon Phi processor. Batch Size LR Policy Start Time End Time Duration Loss Accuracy at Top 1 32 POLY 17:53 20:36 2: POLY 10:59 16:07 6: POLY 18:00 4:19 10: Accuracy at Top 5 5. Conclusion and Future Work We observed from Table 1 that the batch size of 32 was the optimal configuration in terms of speed and accuracy. Though there is a slight increase in accuracy with batch size 64, the gain seems to be quite low, compared to the increase in training time. It was also observed that the learning rate policies have quite a significant impact on the training time and less impact on accuracy. Perhaps the recalculation of the learning rates on every iteration would have slowed down this training. There is a minor gain in the Top 5 Accuracy with the LR policy as POLY, and this might be due to the optimal calculation of the learning rate. There is a chance that the gain might vary quite significantly in a larger dataset. We observed from Table 3 that the Intel Xeon Phi processor efficiency increases as the batch size is increased, and also the decrease in loss happens faster as the batch size is increased. Table 4 infers that the higher batch size also runs faster on Intel Xeon Phi processors. The observations as per the above tables implicates that training in Intel Xeon Phi machines are faster than the same conducted in Xeon machines. Thanks to the bootable host processor that delivers massive parallelism & vectorization. However the accuracy rate produced by Intel Xeon Phi processors is much lower than those produced for Intel Xeon processors for the same number of iterations, so it must be noted that we have to run a few more iterations on Intel Xeon Phi processors as compared to Intel Xeon processors to meet the same accuracy levels. List of Abbreviations Abbreviations MLSL CNN GPU ML CNTK DL LMDB Expanded Form machine learning scalable library convolution neural network graphics processing unit machine learning cognitive toolkit deep learning lightning memory-mapped database 9
10 References 1. Deng, L., LI, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M. L., Zweig, G., He, X., Williams, J., Gong, Y., and Aceri, A. Recent Advances in Deep Learning for Speech Research at Microsoft. In ICASSP (2013). 2. Krizhevsky, A., Sutskever, I., and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS (2012). 3. Mikolov, T., Chen, K., Corrado, G., and Deahn, J. Efficient Estimation of Word Representations in Vector Space. In ICLRW (2013). 4. Cherlopalle, Deepthi and Weage, Joshua Dell HPC Omni-Path Fabric: Supported Architecture and Application Study June 2016 More details on Intel Xeon Phi processor: Intel Xeon Phi Processor Intel Distribution for Caffe*: Manage Deep Learning Networks with Intel Distribution for Caffe Multinode Guide: Guide to multi-node training with Intel Distribution of Caffe* Intel Omni Path Architecture Cluster Setup: Dell HPC Omni-Path Fabric: Supported Architecture and Application Study Intel MLSL Package: Intel MLSL 2017 Beta Optimization Notice Intel's Compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimization include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessors-dependent optimizations in this product are intended to use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guide for more information regarding specific instruction sets covered by this notice. Notice revision # Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as Spectre and Meltdown. Implementation of these updates may make these results inapplicable to your device or system. Intel, the Intel logo, Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others Intel Corporation Printed in USA 0518/BA/PDF Please Recycle 10
WITH INTEL TECHNOLOGIES
WITH INTEL TECHNOLOGIES Commitment Is to Enable The Best Democratize technologies Advance solutions Unleash innovations Intel Xeon Scalable Processor Family Delivers Ideal Enterprise Solutions NEW Intel
More informationA performance comparison of Deep Learning frameworks on KNL
A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description
More informationIntel Architecture 2S Server Tioga Pass Performance and Power Optimization
Intel Architecture 2S Server Tioga Pass Performance and Power Optimization Terry Trausch/Platform Architect/Intel Inc. Whitney Zhao/HW Engineer/Facebook Inc. Agenda Tioga Pass Feature Overview Intel Xeon
More informationNVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit
NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit Ben Walker Data Center Group Intel Corporation 2018 Storage Developer Conference. Intel Corporation. All Rights Reserved. 1 Notices
More informationunleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer
the future unleashed Alexey Belogortsev Field Application Engineer Intel Xeon Scalable Processors for High Performance Computing Growing Challenges in System Architecture The Walls System Bottlenecks Divergent
More informationAccelerate Deep Learning Inference with openvino toolkit
Accelerate Deep Learning Inference with openvino toolkit Priyanka Bagade, IoT Developer Evangelist, Intel Core and Visual Computing Group Optimization Notice Intel s compilers may or may not optimize to
More informationApril 2 nd, Bob Burroughs Director, HPC Solution Sales
April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software
More informationDaniel Verkamp, Software Engineer
Daniel Verkamp, Software Engineer Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn
More informationAndreas Schneider. Markus Leberecht. Senior Cloud Solution Architect, Intel Deutschland. Distribution Sales Manager, Intel Deutschland
Markus Leberecht Senior Cloud Solution Architect, Intel Deutschland Andreas Schneider Distribution Sales Manager, Intel Deutschland Legal Disclaimers 2016 Intel Corporation. Intel, the Intel logo, Xeon
More informationAccelerate Machine Learning on macos with Intel Integrated Graphics. Hisham Chowdhury May 23, 2018
Accelerate Machine Learning on macos with Intel Integrated Graphics Hisham Chowdhury May 23, 2018 Apple Machine Learning Stack Machine Learning Application 1 Machine Learning Application 2 Vision Natural
More informationIXPUG 16. Dmitry Durnov, Intel MPI team
IXPUG 16 Dmitry Durnov, Intel MPI team Agenda - Intel MPI 2017 Beta U1 product availability - New features overview - Competitive results - Useful links - Q/A 2 Intel MPI 2017 Beta U1 is available! Key
More informationDeep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor
WELCOME TO Operant conditioning 1938 Spiking Neuron 1952 first neuroscience department 1964 Deep learning prevalence mid 2000s The Turing Machine 1936 Transistor 1947 First computer science department
More informationSayantan Sur, Intel. SEA Symposium on Overlapping Computation and Communication. April 4 th, 2018
Sayantan Sur, Intel SEA Symposium on Overlapping Computation and Communication April 4 th, 2018 Legal Disclaimer & Benchmark results were obtained prior to implementation of recent software patches and
More informationH.J. Lu, Sunil K Pandey. Intel. November, 2018
H.J. Lu, Sunil K Pandey Intel November, 2018 Issues with Run-time Library on IA Memory, string and math functions in today s glibc are optimized for today s Intel processors: AVX/AVX2/AVX512 FMA It takes
More informationSaliency-guided Selective Magnification for Company Logo Detection
Saliency-guided Selective Magnification for Company Logo Detection Christian Eggert, Anton Winschel, Dan Zecha, Rainer Lienhart Multimedia Computing and Computer Vision Lab University of Augsburg Augsburg,
More informationRavindra Babu Ganapathi
14 th ANNUAL WORKSHOP 2018 INTEL OMNI-PATH ARCHITECTURE AND NVIDIA GPU SUPPORT Ravindra Babu Ganapathi Intel Corporation [ April, 2018 ] Intel MPI Open MPI MVAPICH2 IBM Platform MPI SHMEM Intel MPI Open
More informationAchieving 2.5X 1 Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization
white paper Internet Discovery Artificial Intelligence (AI) Achieving.X Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization As one of the world s preeminent
More informationData-Centric Innovation Summit NAVEEN RAO CORPORATE VICE PRESIDENT & GENERAL MANAGER ARTIFICIAL INTELLIGENCE PRODUCTS GROUP
Data-Centric Innovation Summit NAVEEN RAO CORPORATE VICE PRESIDENT & GENERAL MANAGER ARTIFICIAL INTELLIGENCE PRODUCTS GROUP Data center logic silicon Tam ~30% cagr Ai is exploding $8-10B Emerging as a
More informationVälkommen. Intel Anders Huge
Välkommen Intel Anders Huge Transformative Technology from Intel A n d e r s H u g e I n t e l Why intel INTEL CORPORATION 5 TRANSFORMING BUSINESS MODERN BUSINESS DEMANDS Intel VISION Accelerate workplace
More informationHPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov
HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,
More informationOPENSHMEM AND OFI: BETTER TOGETHER
4th ANNUAL WORKSHOP 208 OPENSHMEM AND OFI: BETTER TOGETHER James Dinan, David Ozog, and Kayla Seager Intel Corporation [ April, 208 ] NOTICES AND DISCLAIMERS Intel technologies features and benefits depend
More informationTHE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF
14th ANNUAL WORKSHOP 2018 THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF Paul Luse Intel Corporation Apr 2018 AGENDA Storage Performance Development Kit What is SPDK? The SPDK Community Why are so
More informationBei Wang, Dmitry Prohorov and Carlos Rosales
Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationIntel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python
Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Python Landscape Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#1:
More informationIntel SSD Data center evolution
Intel SSD Data center evolution March 2018 1 Intel Technology Innovations Fill the Memory and Storage Gap Performance and Capacity for Every Need Intel 3D NAND Technology Lower cost & higher density Intel
More informationHPC Advisory COUNCIL
Inside AI HPC Advisory COUNCIL Lugano 2017 Scalable Systems for Distributed Deep Learning Benchmarking, Performance Optimization and Architectures Gaurav kaul SYSTEMS ARCHITECT, INTEL DATACENTRE GROUP
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationSayantan Sur, Intel. ExaComm Workshop held in conjunction with ISC 2018
Sayantan Sur, Intel ExaComm Workshop held in conjunction with ISC 2018 Legal Disclaimer & Optimization Notice Software and workloads used in performance tests may have been optimized for performance only
More informationHPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF
HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF 1 Outline KNL results Our other work related to HPCG 2 ~47 GF/s per KNL ~10
More informationData center: The center of possibility
Data center: The center of possibility Diane bryant Executive vice president & general manager Data center group, intel corporation Data center: The center of possibility The future is Thousands of Clouds
More informationChangpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group
Changpeng Liu Cloud Storage Software Engineer Intel Data Center Group Notices & Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software
More informationIntel Parallel Studio XE 2015
2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:
More informationIntel Performance Libraries
Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation
More informationMunara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.
Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend
More informationFast forward. To your <next>
Fast forward To your Navin Shenoy EXECUTIVE VICE PRESIDENT GENERAL MANAGER, DATA CENTER GROUP CLOUD ECONOMICS INTELLIGENT DATA PRACTICES NETWORK TRANSFORMATION Intel Xeon Scalable Platform The
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationINTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT
INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT UPDATE ON OPENSWR: A SCALABLE HIGH- PERFORMANCE SOFTWARE RASTERIZER FOR SCIVIS Jefferson Amstutz Intel
More informationIntel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor
Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS
More informationVectorization Advisor: getting started
Vectorization Advisor: getting started Before you analyze Run GUI or Command Line Set-up environment Linux: source /advixe-vars.sh Windows: \advixe-vars.bat Run GUI or Command
More informationFAST FORWARD TO YOUR <NEXT> CREATION
FAST FORWARD TO YOUR CREATION THE ULTIMATE PROFESSIONAL WORKSTATIONS POWERED BY INTEL XEON PROCESSORS 7 SEPTEMBER 2017 WHAT S NEW INTRODUCING THE NEW INTEL XEON SCALABLE PROCESSOR BREAKTHROUGH PERFORMANCE
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationEngineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary
white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs
More informationIntel Math Kernel Library (Intel MKL) Team - Presenter: Murat Efe Guney Workshop on Batched, Reproducible, and Reduced Precision BLAS Georgia Tech,
Intel Math Kernel Library (Intel MKL) Team - Presenter: Murat Efe Guney Workshop on Batched, Reproducible, and Reduced Precision BLAS Georgia Tech, Atlanta February 24, 2017 Acknowledgements Benoit Jacob
More informationSPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation
SPDK China Summit 2018 Ziye Yang Senior Software Engineer Network Platforms Group, Intel Corporation Agenda SPDK programming framework Accelerated NVMe-oF via SPDK Conclusion 2 Agenda SPDK programming
More informationChangpeng Liu. Senior Storage Software Engineer. Intel Data Center Group
Changpeng Liu Senior Storage Software Engineer Intel Data Center Group Legal Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware,
More informationHigh Performance Computing The Essential Tool for a Knowledge Economy
High Performance Computing The Essential Tool for a Knowledge Economy Rajeeb Hazra Vice President & General Manager Technical Computing Group Datacenter & Connected Systems Group July 22 nd 2013 1 What
More informationSoftware Optimization Case Study. Yu-Ping Zhao
Software Optimization Case Study Yu-Ping Zhao Yuping.zhao@intel.com Agenda RELION Background RELION ITAC and VTUE Analyze RELION Auto-Refine Workload Optimization RELION 2D Classification Workload Optimization
More informationReal-Time Systems and Intel take industrial embedded systems to the next level
Solution brief Industrial IoT (IIoT) Embedded Software and Systems Real-Time Systems and Intel take industrial embedded systems to the next level Innovative hypervisor and partitioning software increases
More informationFast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud
Fast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud Kyle Corrigan, Cloud Product Line Manager, Intel Server Products Group Wagner Diaz, Product Marketing Engineer, Intel Data Center
More informationAgenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP
More informationINTEL MKL Vectorized Compact routines
INTEL MKL Vectorized Compact routines Mesut Meterelliyoz, Peter Caday, Timothy B. Costa, Kazushige Goto, Louise Huot, Sarah Knepper, Arthur Araujo Mitrano, Shane Story 2018 BLIS RETREAT 09/17/2018 OUTLINE
More informationJim Pappas Director of Technology Initiatives, Intel Vice-Chair, Storage Networking Industry Association (SNIA) December 07, 2018
Jim Pappas Director of Technology Initiatives, Intel Vice-Chair, Storage Networking Industry Association (SNIA) December 07, 2018 jim@intel.com 1 How did this Effort Start? Memristor MRAM Carbon Nanotube
More informationIntel Distribution for Python* и Intel Performance Libraries
Intel Distribution for Python* и Intel Performance Libraries 1 Motivation * L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer, 2000, Vol. 33, Issue 10, pp. 23-29 ** RedMonk
More informationIntel optane memory as platform accelerator. Vladimir Knyazkin
Intel optane memory as platform accelerator Vladimir Knyazkin 2 Legal Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service
More informationReal World Development examples of systems / iot
Real World Development examples of systems / iot Intel Software Developer Conference Seoul 2017 Jon Kim Software Consulting Engineer Contents IOT end-to-end Scalability with Intel x86 Architect Real World
More informationSarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018
Sarah Knepper Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018 Outline Motivation Problem statement and solutions Simple example Performance comparison 2 Motivation Partial differential equations
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationJomar Silva Technical Evangelist
Jomar Silva Technical Evangelist Agenda Introduction Intel Graphics Performance Analyzers: what is it, where do I get it, and how do I use it? Intel GPA with VR What devices can I use Intel GPA with and
More informationWerner Schueler. Enterprise Account Manager, Intel
Werner Schueler Enterprise Account Manager, Intel The next big wave mainframes Standardsbased servers Cloud computing Data deluge COMPUTE breakthrough Innovation surge Artificial intelligence 12X AI Compute
More informationAccelerating Data Center Workloads with FPGAs
Accelerating Data Center Workloads with FPGAs Enno Lübbers NorCAS 2017, Linköping, Sweden Intel technologies features and benefits depend on system configuration and may require enabled hardware, software
More informationOpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel
OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel Clang * : An Excellent C++ Compiler LLVM * : Collection of modular and reusable compiler and toolchain technologies Created by Chris Lattner
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationEfficient Parallel Programming on Xeon Phi for Exascale
Efficient Parallel Programming on Xeon Phi for Exascale Eric Petit, Intel IPAG, Seminar at MDLS, Saclay, 29th November 2016 Legal Disclaimers Intel technologies features and benefits depend on system configuration
More informationIn partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER
In partnership with VelocityAI REFERENCE JULY // 2018 Contents Introduction 01 Challenges with Existing AI/ML/DL Solutions 01 Accelerate AI/ML/DL Workloads with Vexata VelocityAI 02 VelocityAI Reference
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationExpressing and Analyzing Dependencies in your C++ Application
Expressing and Analyzing Dependencies in your C++ Application Pablo Reble, Software Engineer Developer Products Division Software and Services Group, Intel Agenda TBB and Flow Graph extensions Composable
More informationMohan J. Kumar Intel Fellow Intel Corporation
OCP Initiatives and Intel Implementations Mohan J. Kumar Intel Fellow Intel Corporation Agenda Open Firmware Firmware at Scale Platform Attestation Summary Open Firmware UEFI-based Open Firmware (for Intel-based
More informationEnabling the future of Artificial intelligence
Enabling the future of Artificial intelligence Contents AI Overview Intel Nervana AI products Hardware Software Intel Nervana Deep Learning Platform Learn more - Intel Nervana AI Academy Artificial Intelligence,
More informationBecca Paren Cluster Systems Engineer Software and Services Group. May 2017
Becca Paren Cluster Systems Engineer Software and Services Group May 2017 Clusters are complex systems! Challenge is to reduce this complexity barrier for: Cluster architects System administrators Application
More informationPerformance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor
* Some names and brands may be claimed as the property of others. Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor E.J. Bylaska 1, M. Jacquelin
More informationIntel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides
More informationHPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab
HPC and AI Solution Overview Garima Kochhar HPC and AI Innovation Lab 1 Dell EMC HPC and DL team charter Design, develop and integrate HPC and DL Heading systems Lorem ipsum dolor sit amet, consectetur
More informationJacek Czaja, Machine Learning Engineer, AI Product Group
Jacek Czaja, Machine Learning Engineer, AI Product Group Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,
More informationCase Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing
Case Study Software Optimizing an Illegal Image Filter System Intel Integrated Performance Primitives High-Performance Computing Tencent Doubles the Speed of its Illegal Image Filter System using SIMD
More informationVikram A. Saletore, Ph.D. Principal Engineer, AI Products Group, Intel. Lucas A. Wilson, Ph.D. and Alex Filby HPC and AI Engineering, Dell EMC
Vikram A. Saletore, Ph.D. Principal Engineer, AI Products Group, Intel Valeriu Codreanu, Ph.D. and Damian Podareanu. MSc, Research & Data Scientist, SURFsara B.V. Lucas A. Wilson, Ph.D. and Alex Filby
More informationIFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor
IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization
More informationOpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing
OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US
More informationSergey Maidanov. Software Engineering Manager for Intel Distribution for Python*
Sergey Maidanov Software Engineering Manager for Intel Distribution for Python* Introduction Python is among the most popular programming languages Especially for prototyping But very limited use in production
More informationEFFICIENT INFERENCE WITH TENSORRT. Han Vanholder
EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60
More informationMore performance options
More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel
More informationGraphics Performance Analyzer for Android
Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent
More informationDr. Jean-Laurent PHILIPPE, PhD EMEA HPC Technical Sales Specialist. With Dell Amsterdam, October 27, 2016
Dr. Jean-Laurent PHILIPPE, PhD EMEA HPC Technical Sales Specialist With Dell Amsterdam, October 27, 2016 Legal Disclaimers Intel technologies features and benefits depend on system configuration and may
More informationDEVITO AUTOMATED HIGH-PERFORMANCE FINITE DIFFERENCES FOR GEOPHYSICAL EXPLORATION
DEVITO AUTOMATED HIGH-PERFORMANCE FINITE DIFFERENCES FOR GEOPHYSICAL EXPLORATION F. Luporini 1, C. Yount 4, M. Louboutin 3, N. Kukreja 1, P. Witte 2, M. Lange 5, P. Kelly 1, F. Herrmann 3, G.Gorman 1 1Imperial
More informationJim Harris. Principal Software Engineer. Intel Data Center Group
Jim Harris Principal Software Engineer Intel Data Center Group Notices & Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or
More informationAccelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel
Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel @optimistyzy Notices & Disclaimers Intel technologies features and benefits depend
More informationJim Harris. Principal Software Engineer. Data Center Group
Jim Harris Principal Software Engineer Data Center Group Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationAsynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms
Asynchronous Parallel Stochastic Gradient Descent A Numeric Core for Scalable Distributed Machine Learning Algorithms J. Keuper and F.-J. Pfreundt Competence Center High Performance Computing Fraunhofer
More informationDistributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,
More informationScaling Out Python* To HPC and Big Data
Scaling Out Python* To HPC and Big Data Sergey Maidanov Software Engineering Manager for Intel Distribution for Python* What Problems We Solve: Scalable Performance Make Python usable beyond prototyping
More informationIntel Cluster Checker 3.0 webinar
Intel Cluster Checker 3.0 webinar June 3, 2015 Christopher Heller Technical Consulting Engineer Q2, 2015 1 Introduction Intel Cluster Checker 3.0 is a systems tool for Linux high performance compute clusters
More informationConvolutional Neural Network Layer Reordering for Acceleration
R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationVirtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs
Solution brief Software-Defined Data Center (SDDC) Hyperconverged Platforms Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs Virtuozzo benchmark
More informationIntel Speed Select Technology Base Frequency - Enhancing Performance
Intel Speed Select Technology Base Frequency - Enhancing Performance Application Note April 2019 Document Number: 338928-001 You may not use or facilitate the use of this document in connection with any
More informationMachine Learning on VMware vsphere with NVIDIA GPUs
Machine Learning on VMware vsphere with NVIDIA GPUs Uday Kurkure, Hari Sivaraman, Lan Vu GPU Technology Conference 2017 2016 VMware Inc. All rights reserved. Gartner Hype Cycle for Emerging Technology
More informationWhat s P. Thierry
What s new@intel P. Thierry Principal Engineer, Intel Corp philippe.thierry@intel.com CPU trend Memory update Software Characterization in 30 mn 10 000 feet view CPU : Range of few TF/s and
More informationMemory & Thread Debugger
Memory & Thread Debugger Here is What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action Intel Confidential 2 Analysis Tools for Diagnosis
More information