Investigation of Intel MIC for implementation of Fast Fourier Transform

Size: px
Start display at page:

Download "Investigation of Intel MIC for implementation of Fast Fourier Transform"

Transcription

1 Investigation of Intel MIC for implementation of Fast Fourier Transform Soren Goyal Department of Physics IIT Kanpur address: The objective of the project was to run the code for Fast Fourier Transform on a newly developed computing architecture The Intel MIC (Many Integrated Core). If higher performance is obtained on this architecture then the simulation software Tarang which relies heavily on computing Fourier Transforms will also be able to perform better and faster. I. INTRODUCTION Accurate numerical schemes are required for simulating turbulent flows. Such simulations are of importance in weather prediction, climate modeling and aid in general understanding of fluid flows. Prof Mahendra K Verma has developed Tarang (Verma, 2011) for these applications. At present, Tarang has solvers for incompressible flows involving pure fluid, Rayleigh Benard convection, passive and active scalars, magnetohydrodynamics, liquid metals, etc. One of the key algorithms used in Tarang is the Fast Fourier Transform (FFT). Generally simulations spend 75% of their time computing Fourier Transforms. Therefore, optimizing the FFT implementation will significantly improve Tarang s capacity to handle bigger simulations. At present Tarang is written to run on Intel Multi-Core Platforms (e.g Intel Xeon, Intel Core i7). However new promising computing platforms have been developed such as Intel MIC (Many Integrated Core), which can be leveraged to improve the performance of Tarang. The objective of the project is to investigate the Intel MIC architecture and port the Tarang s code to this new platform. II. ABOUT INTEL MIC Intel MIC (pronounced mike) or Intel Many Integrated Cores, is a coprocessor computer architecture developed by Intel for the High Performance Computing. Intel MIC is a shared memory architecture combines many Intel CPU cores onto a single chip. Programs for it can be written in C, C++ and FORTRAN. The programs use the familiar programming models and support parallel execution of code using standard parallel programming APIs like OpenMP and MPI. This gives Intel MIC the advantage that an existing source code written for an Intel Xeon processor can be compiled and run on an Intel MIC based chip. Starting from 2011, processors based on this architecture have been released and branded as Intel Xeon Phi. These processors have been installed in many supercomputing facilities. The Texas Advanced Computing Center (TACC) is using Coprocessors based on Intel MIC in their 10- PetaFLOPS "Stampede" supercomputer. In June 2013, the Tianhe-2 supercomputer at the National Supercomputing Center in Guangzhou (NSCC-GZ) was announced as the world's fastest supercomputer. It utilizes Intel Ivy Bridge-EP Xeon and Xeon Phi processors to achieve PetaFLOPS. In IIT Kanpur s HPC-2010 Computing cluster 4 nodes have been equipped with the Xeon Phi 5100 Cards, 2 cards on each node. The key features of Intel Xeon 5100 are cores (variation of the Intel Pentium core) have been packed on a single chip, with a shared memory of 16GB. 2. Each core can execute up to 4 threads at once, giving the processor the ability to execute a total of 240 processes in parallel. 3. Each core has a 512-bit wide SIMD Vector Processing Units (VPUs). 4. Xeon Phi is a coprocessor, so parts of computation from the host processor can be offloaded onto it for execution. Further details on how to use Intel Xeon Phi installed in the HPC 2010 Cluster are given in the Appendix. III. ABOUT FAST FOURIER TRANSFORM A Fast Fourier Transform (FFT) is an algorithm to compute the discrete Fourier transform (DFT) and its inverse. They are frequently used in engineering applications. Fast Fourier Transforms have been described as the most important numerical algorithm of our lifetime (Strang 1994). Given a Signal X ( t ) of size N, the naïve way of computing the Discrete Fourier Transform F( k ) is given by the following equation N 1 t 0 2 t N F( k) X ( t) e i k (3.1) 1

2 This can be interpreted as multiplying the vector X(t) with the matrix W whose elements are dimension of the Matrix is N 2 O( N ). N Wij 2 i ij N e. As the, the operation will have a time complexity of An algorithm to calculate the approximate Fourier Transform was devised by Goertzel (Goertzel 1958). It was a O(log N ) algorithm but the errors grew rapidly (Gentleman 1969), so it was suitable only for computing small number of coefficients. As finding fourier transform is essentially the task of multiplying a matrix with a vector, (Cooley, et al., 1965) propsed an alogrithm which was based on (Good, 1958) technique of matrix multiplication. IV. EXPERIMENTS 1. Comparison of different FFT implementations FFT has been implemented by a number of libraries, the most famous among them is the FFTW3[citation] developed at MIT by Matteo Frigo and Steven Johnson. For use on its processors, Intel too has developed an implementation of FFT. It is shipped as a part of the Intel MKL (Math Kernel Library). As part of the first experiment and to get a hands on experience of parallel programming in the Intel Xeon and MIC environments, the performance of the FFTW and MKL were compared. a. Comparision of performance of FFTW3 and MKL on Intel Xeon The current code is Tarang is compatible with Intel Xeon and uses FFTW3 for computing Fourier transforms. The task was to compute the Fast Fourier transform of a signal containing 10 7 elements. This operation typically gets faster as more and more threads (parallel processes) are used. The time taken to compute the transform was recorded as a function of number threads as shown in Fig 1. Clearly Intel MKL outperforms FFTW3. Another point to note here is that the Intel MKL has wrapper functions for FFTW3. This makes the job of programming easier since the same code; albeit with minor modifications, can be linked to either FFTW or MKL as per requirement. b. Comparison of performance of FFTW and MKL on Intel Xeon Phi Intel MKL does support Intel MIC. FFTW is compatible with the most of the common x86 platforms. But since the Intel MIC is a new architecture, FFTW could not be built for it. Although it might be possible to tweak the FFTW build process to get this done, but that would require deeper knowledge of Intel MIC FIG. 1 (Color Online) Comparison between FFT implementation of Intel MKL and FFTW3 on Intel Xeon and FFTW compile process. Intel MKL does support Intel MIC. 2. Scaling of FFT on Intel Xeon and Intel Xeon Phi The performance of FFT on Xeon and Xeon Phi was compared. The MKL implementation was only used, because as observed in the previous experiment FFTW is incompatible with Xeon Phi. Code on Xeon Phi and Xeon can be executed in a number of ways (all of them have been described in the Appendix). For Intel Xeon there were two possibilities Offload Enabled and Offload Disabled. The documentation of Intel MKL claims that if a Xeon Phi is attached to the Host Processor the compiler will ensure that the relevant portions of the code are offloaded on to the Xeon Phi to gain additional speed-up. In the Offload Enabled mode this feature is allowed while in the Offload Disabled mode the code is executed strictly on the host processor. Fig 2 compares the performance in the two modes. Fig 2 (Color Online) Comparison of performance of FFT on Intel Xeon with Offload enabled and disabled For Intel Xeon Phi, there are two modes of execution (both are explained in the Appendix) Offload 2

3 Execution and Native Execution. The documentation claims that the Native Execution is faster than Offload Execution because Offload Execution has communication overheads. Fig 3 shows the performances of the two modes FFT of a 10 7 element long signal was computed for each of the 4 modes of execution. 3. Variation in performance with Signal Size As mentioned Section II: About Fourier Transform, the performance of implementations of FFT are dependent on the signal size. The Intel MKL implementation performs faster for signal sizes which can be factored into products of smaller primes. So if the signal size is of the form 2 k it will be transformed most efficiently, while a signal size equal to a large prime will be the most inefficient. The size of the signal was varied between 10 6 and 10 7, taking only the numbers of the form 2 n 3 m. The transformed were carried out on both Xeon and Xeon Phi. A 3D graph is plotted with Signal Size (Data Size) on X-axis, Number of threads on Y-axis and GigaFlops on the Z-axis. Note that in the graph although the execution on Xeon was scaled up to has a maximum of 32 threads, it has been stretched along the Y-axis to 240 threads for ease of comparison. FIG. 3 (Color Online) Comparison between performance of native execution and Offload execution of FFT on Intel Xeon Phi Xeon can support 32 parallel threads while Xeon Phi can support 240 parallel threads. It was expected that as FFT is highly scalable algorithm, FFT would perform much better on Xeon Phi. But as evident from the graph, the performance on Xeon Phi is worse than performance on Xeon. The FFT on Xeon Phi does not scale beyond 120 threads. Fig 4 shows the performance of FFT in both Xeon and Xeon Phi. FIG. 5 3D plot of Signal Size Vs Number of Threads vs Performance(GFLOPs) FIG. 4 (Color Online) Performance of FFT on Intel Xeon and Intel Xeon Phi It was observed that for certain combinations of Threads and Signal Size the data points of Xeon Phi lie above that of Xeon, indicating that Xeon Phi can indeed outperform Xeon. A graph showing the best performances of Xeon and Xeon Phi have been plotted for comparison. 3

4 FIG. 6 Best performances of Xeon and Xeon Phi for various signal sizes V. CONCLUSIONS It is clear that Intel Xeon Phi might offer an opportunity to speed up Tarang. Added to this is the fact that the existing code of Tarang can be ported to Intel Xeon Phi with minimal changes. This makes Intel Xeon Phi a very attractive option. However to obtain the speed up a lot more study needs to be carried out. The implementation of Fast Fourier Transform is highly non-trivial and requires careful study to identify the parameters, its performance depends on. Similarly, Xeon Phi is also new computer architecture and the nuances of its hardware must be known to optimize FFT for it. VI. FUTURE WORK The first step would be to understand the parallel implementation of FFT and the architecture of Intel Xeon Phi. The understanding gained through this will be used to optimize the algorithm for Intel MIC architecture. Further the following questions also need to be answered- Is MKL utilizing the VPU units of Intel Xeon Phi? Why is the FFT algorithm not scaling beyond 120 threads on Xeon Phi? Xeon Phi has 60 physical cores and each core can support 4 threads, so when more than 60 threads are instantiated, how are they distributed among the cores? Bibliography Cooley J W and Tukey J W An algorithm for the machine calculation of complex Fourier series [Journal]. - [s.l.] : Mathematics of Computation, Gentleman W. M. An error analysis of Goertzel's (Watt's) method for computing Fourier Coefficients [Journal] // Journal of Computation pp. 12: Goertzel G. An algorithm for the evaluation of fnite trigonometric series [Article] // The American Mathematical. - January p. 65(1): Good I J The interaction algorithm and practical Fourier analysis. [Journal] // Statistics. - [s.l.] : Royal Society, Strang Gilbert Wavelets [Article] // American Scientist. - May p. 82. Thiagarajan Sudha Udanapalli [et al.] Intel Xeon Phi Coprocessor Developer's Quick Start Guide [Online]. - Intel, Verma Mahendra K Object-oriented Pseudo-spectral code TARANG for turbulence simulation [Online] // arxiv.org. - March APPENDIX Working with Intel Xeon Phi The reference material for development on Intel Xeon Phi is available at (Thiagarajan, et al., 2013). This article gives a high-level description of features of Xeon Phi and how to program on it in IIT Kanpur s HPC Setting Up the Environment After getting an account created at the HPC-2010 cluster do the following to access the Xeon Phi coprocessor and set up the environment 1. Log on to the HPC2010 hpc2010.hpc.iitk.ac.in 2. 4 nodes are available which have Xeon Phi cards attached to them mic001, mic002, mic003 and mic004. Log onto any one of mic To run the 64 bit Intel Compiler, /opt/extra_software/intel/initpaths intel64 Parallel Programming Options on Intel Xeon Phi Most of the parallel programming options available on the host systems are available for the Intel Xeon Phi Coprocessor. These include the following: 1. Intel Threading Building Blocks (Intel TBB) 2. OpenMP 3. Intel Cilk Plus 4. pthreads Of the 4 options only OpenMP was used for this project. There is no correspondence between OpenMP threads on the host CPU and on the Intel Xeon Phi Coprocessor. Because an OpenMP parallel region within an 4

5 offload/pragma is offloaded as a unit, the offload compiler creates a team of threads based on the available resources on Intel Xeon Phi Coprocessor. Since the entire OpenMP construct is executed on the Intel Xeon Phi coprocessor, within the construct the usual OpenMP semantics of shared and private data apply. Compiling A Program There are two main ways of compiling a program for Xeon Phi Native Compilation: The binary is built on the host s file system using the Intel s icc compiler. This file and its dependencies are then copied to the coprocessor s filesystem and executed. For example if the following code is saved in text.cpp. int main() float ret = 0; int data[size] = initialze(): #pragma omp parallel for for(int i = 0; i < SIZE; i++) ret +=data[i]; return ret; The pragama directive will instruct the compiler to divide the for loop among maximum number of threads the processor can support. The Compilation commands will be as follows 1. Compile the program with the mmic -mmic -openmp test.cpp 2. The output file is not required to be copied to the Xeon Phi s filesystem as the host and the Xeon Phi share the same filesystem on HPC2010. So now log on to one the MIC cards (either mic001-mic0 or mic001-mic0 3. Set the library paths for OpenMP and Intel MKL Offload Compilation: Intel s icc allows the user to specify a region of code to be offloaded to the Xeon Phi card. There are many ways to it, here a simple method is described to get started and run basic codes. The code is modified to indicate the segment of the code to be offloaded int main() float ret = 0; int data[size] = initialzefooarray(): #prgama offload target (mic) #pragma omp parallel for for(int i = 0; i < SIZE; i++) ret +=data[i]; return ret; To compile and run it, following commands are followed 1. As the code contains segments to be offloaded, the compiler has to be instructed to link the offload segments to libraries meant for MIC. This done using the -offload-option. The $(LIBS) variable can be replaced by the libraries to be used for compilation of the test.cpp -openmp -offloadattribute-target=mic -offload- option,mic,compiler,"- L/opt/extra_software/intel/composer_xe_20 13/lib/mic $(LIBS)" 2. The program can now be executed on the host directly, the segment meant for Xeon Phi will be LD_LIBRARY_PATH=/opt/extra_software/i ntel/mkl/lib/mic/:/opt/extra_software /intel/composer_xe_2013/lib/mic/ 4. Execute the 5

Intel Performance Libraries

Intel Performance Libraries Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation

More information

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Tutorial Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

Architecture, Programming and Performance of MIC Phi Coprocessor

Architecture, Programming and Performance of MIC Phi Coprocessor Architecture, Programming and Performance of MIC Phi Coprocessor JanuszKowalik, Piotr Arłukowicz Professor (ret), The Boeing Company, Washington, USA Assistant professor, Faculty of Mathematics, Physics

More information

Introduction to Xeon Phi. Bill Barth January 11, 2013

Introduction to Xeon Phi. Bill Barth January 11, 2013 Introduction to Xeon Phi Bill Barth January 11, 2013 What is it? Co-processor PCI Express card Stripped down Linux operating system Dense, simplified processor Many power-hungry operations removed Wider

More information

Overview of Intel Xeon Phi Coprocessor

Overview of Intel Xeon Phi Coprocessor Overview of Intel Xeon Phi Coprocessor Sept 20, 2013 Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu This talk is only a trailer A comprehensive training on running and optimizing

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

Accelerator Programming Lecture 1

Accelerator Programming Lecture 1 Accelerator Programming Lecture 1 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de January 11, 2016 Accelerator Programming

More information

Intel MIC Programming Workshop, Hardware Overview & Native Execution. IT4Innovations, Ostrava,

Intel MIC Programming Workshop, Hardware Overview & Native Execution. IT4Innovations, Ostrava, , Hardware Overview & Native Execution IT4Innovations, Ostrava, 3.2.- 4.2.2016 1 Agenda Intro @ accelerators on HPC Architecture overview of the Intel Xeon Phi (MIC) Programming models Native mode programming

More information

the Intel Xeon Phi coprocessor

the Intel Xeon Phi coprocessor the Intel Xeon Phi coprocessor 1 Introduction about the Intel Xeon Phi coprocessor comparing Phi with CUDA the Intel Many Integrated Core architecture 2 Programming the Intel Xeon Phi Coprocessor with

More information

Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ,

Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ, Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ, 27.6.- 29.6.2016 1 Agenda Intro @ accelerators on HPC Architecture overview of the Intel Xeon Phi Products Programming models Native

More information

Parallel Programming on Ranger and Stampede

Parallel Programming on Ranger and Stampede Parallel Programming on Ranger and Stampede Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition December 11, 2012 What is Stampede? NSF-funded XSEDE

More information

A Unified Approach to Heterogeneous Architectures Using the Uintah Framework

A Unified Approach to Heterogeneous Architectures Using the Uintah Framework DOE for funding the CSAFE project (97-10), DOE NETL, DOE NNSA NSF for funding via SDCI and PetaApps A Unified Approach to Heterogeneous Architectures Using the Uintah Framework Qingyu Meng, Alan Humphrey

More information

Introduction to the Intel Xeon Phi on Stampede

Introduction to the Intel Xeon Phi on Stampede June 10, 2014 Introduction to the Intel Xeon Phi on Stampede John Cazes Texas Advanced Computing Center Stampede - High Level Overview Base Cluster (Dell/Intel/Mellanox): Intel Sandy Bridge processors

More information

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further

More information

The Era of Heterogeneous Computing

The Era of Heterogeneous Computing The Era of Heterogeneous Computing EU-US Summer School on High Performance Computing New York, NY, USA June 28, 2013 Lars Koesterke: Research Staff @ TACC Nomenclature Architecture Model -------------------------------------------------------

More information

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting

More information

Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor Intel Xeon Phi Coprocessor 1 Agenda Introduction Intel Xeon Phi Architecture Programming Models Outlook Summary 2 Intel Multicore Architecture Intel Many Integrated Core Architecture (Intel MIC) Foundation

More information

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

An Introduction to the Intel Xeon Phi. Si Liu Feb 6, 2015

An Introduction to the Intel Xeon Phi. Si Liu Feb 6, 2015 Training Agenda Session 1: Introduction 8:00 9:45 Session 2: Native: MIC stand-alone 10:00-11:45 Lunch break Session 3: Offload: MIC as coprocessor 1:00 2:45 Session 4: Symmetric: MPI 3:00 4:45 1 Last

More information

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid

More information

Introduc)on to Xeon Phi

Introduc)on to Xeon Phi Introduc)on to Xeon Phi ACES Aus)n, TX Dec. 04 2013 Kent Milfeld, Luke Wilson, John McCalpin, Lars Koesterke TACC What is it? Co- processor PCI Express card Stripped down Linux opera)ng system Dense, simplified

More information

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero Introduction to Intel Xeon Phi programming techniques Fabio Affinito Vittorio Ruggiero Outline High level overview of the Intel Xeon Phi hardware and software stack Intel Xeon Phi programming paradigms:

More information

PRACE PATC Course: Intel MIC Programming Workshop, MKL LRZ,

PRACE PATC Course: Intel MIC Programming Workshop, MKL LRZ, PRACE PATC Course: Intel MIC Programming Workshop, MKL LRZ, 27.6-29.6.2016 1 Agenda A quick overview of Intel MKL Usage of MKL on Xeon Phi - Compiler Assisted Offload - Automatic Offload - Native Execution

More information

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP

More information

Intel Math Kernel Library 10.3

Intel Math Kernel Library 10.3 Intel Math Kernel Library 10.3 Product Brief Intel Math Kernel Library 10.3 The Flagship High Performance Computing Math Library for Windows*, Linux*, and Mac OS* X Intel Math Kernel Library (Intel MKL)

More information

TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing

TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing Jay Boisseau, Director April 17, 2012 TACC Vision & Strategy Provide the most powerful, capable computing technologies and

More information

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Texas Learning and Computation Center Department of Computer Science University of Houston Outline Motivation

More information

Path to Exascale? Intel in Research and HPC 2012

Path to Exascale? Intel in Research and HPC 2012 Path to Exascale? Intel in Research and HPC 2012 Intel s Investment in Manufacturing New Capacity for 14nm and Beyond D1X Oregon Development Fab Fab 42 Arizona High Volume Fab 22nm Fab Upgrades D1D Oregon

More information

Intel MPI Library Conditional Reproducibility

Intel MPI Library Conditional Reproducibility 1 Intel MPI Library Conditional Reproducibility By Michael Steyer, Technical Consulting Engineer, Software and Services Group, Developer Products Division, Intel Corporation Introduction High performance

More information

Intel Knights Landing Hardware

Intel Knights Landing Hardware Intel Knights Landing Hardware TACC KNL Tutorial IXPUG Annual Meeting 2016 PRESENTED BY: John Cazes Lars Koesterke 1 Intel s Xeon Phi Architecture Leverages x86 architecture Simpler x86 cores, higher compute

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

Heterogeneous Computing and OpenCL

Heterogeneous Computing and OpenCL Heterogeneous Computing and OpenCL Hongsuk Yi (hsyi@kisti.re.kr) (Korea Institute of Science and Technology Information) Contents Overview of the Heterogeneous Computing Introduction to Intel Xeon Phi

More information

OpenMP on Ranger and Stampede (with Labs)

OpenMP on Ranger and Stampede (with Labs) OpenMP on Ranger and Stampede (with Labs) Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition November 6, 2012 Based on materials developed by Kent

More information

Parallel Systems. Project topics

Parallel Systems. Project topics Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a

More information

Intel MIC Architecture. Dr. Momme Allalen, LRZ, PRACE PATC: Intel MIC&GPU Programming Workshop

Intel MIC Architecture. Dr. Momme Allalen, LRZ, PRACE PATC: Intel MIC&GPU Programming Workshop Intel MKL @ MIC Architecture Dr. Momme Allalen, LRZ, allalen@lrz.de PRACE PATC: Intel MIC&GPU Programming Workshop 1 2 Momme Allalen, HPC with GPGPUs, Oct. 10, 2011 What is the Intel MKL? Math library

More information

Many-core Processor Programming for beginners. Hongsuk Yi ( 李泓錫 ) KISTI (Korea Institute of Science and Technology Information)

Many-core Processor Programming for beginners. Hongsuk Yi ( 李泓錫 ) KISTI (Korea Institute of Science and Technology Information) Many-core Processor Programming for beginners Hongsuk Yi ( 李泓錫 ) (hsyi@kisti.re.kr) KISTI (Korea Institute of Science and Technology Information) Contents Overview of the Heterogeneous Computing Introduction

More information

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title Programming for the Intel Many Integrated Core Architecture By James Reinders The Architecture for Discovery PowerPoint Title Intel Xeon Phi coprocessor 1. Designed for Highly Parallel workloads 2. and

More information

Code optimization in a 3D diffusion model

Code optimization in a 3D diffusion model Code optimization in a 3D diffusion model Roger Philp Intel HPC Software Workshop Series 2016 HPC Code Modernization for Intel Xeon and Xeon Phi February 18 th 2016, Barcelona Agenda Background Diffusion

More information

Introduction to Parallel Computing!

Introduction to Parallel Computing! Introduction to Parallel Computing! SDSC Summer Institute! August 6-10, 2012 San Diego, CA! Rick Wagner! HPC Systems Manager! Purpose, Goals, Outline, etc.! Introduce broad concepts " Define terms " Explore

More information

Introduction to the Xeon Phi programming model. Fabio AFFINITO, CINECA

Introduction to the Xeon Phi programming model. Fabio AFFINITO, CINECA Introduction to the Xeon Phi programming model Fabio AFFINITO, CINECA What is a Xeon Phi? MIC = Many Integrated Core architecture by Intel Other names: KNF, KNC, Xeon Phi... Not a CPU (but somewhat similar

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins Thanks to: TACC Team for early access to Stampede J. Davison

More information

Performance of deal.ii on a node

Performance of deal.ii on a node Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions

More information

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware

More information

Intel Math Kernel Library

Intel Math Kernel Library Intel Math Kernel Library Release 7.0 March 2005 Intel MKL Purpose Performance, performance, performance! Intel s scientific and engineering floating point math library Initially only basic linear algebra

More information

Intel Software Development Products for High Performance Computing and Parallel Programming

Intel Software Development Products for High Performance Computing and Parallel Programming Intel Software Development Products for High Performance Computing and Parallel Programming Multicore development tools with extensions to many-core Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

Intel Xeon Phi Coprocessors

Intel Xeon Phi Coprocessors Intel Xeon Phi Coprocessors Reference: Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, by A. Vladimirov and V. Karpusenko, 2013 Ring Bus on Intel Xeon Phi Example with 8 cores Xeon

More information

Maximizing performance and scalability using Intel performance libraries

Maximizing performance and scalability using Intel performance libraries Maximizing performance and scalability using Intel performance libraries Roger Philp Intel HPC Software Workshop Series 2016 HPC Code Modernization for Intel Xeon and Xeon Phi February 17 th 2016, Barcelona

More information

Bring your application to a new era:

Bring your application to a new era: Bring your application to a new era: learning by example how to parallelize and optimize for Intel Xeon processor and Intel Xeon Phi TM coprocessor Manel Fernández, Roger Philp, Richard Paul Bayncore Ltd.

More information

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.

More information

PRACE PATC Course: Intel MIC Programming Workshop, MKL. Ostrava,

PRACE PATC Course: Intel MIC Programming Workshop, MKL. Ostrava, PRACE PATC Course: Intel MIC Programming Workshop, MKL Ostrava, 7-8.2.2017 1 Agenda A quick overview of Intel MKL Usage of MKL on Xeon Phi Compiler Assisted Offload Automatic Offload Native Execution Hands-on

More information

Debugging Intel Xeon Phi KNC Tutorial

Debugging Intel Xeon Phi KNC Tutorial Debugging Intel Xeon Phi KNC Tutorial Last revised on: 10/7/16 07:37 Overview: The Intel Xeon Phi Coprocessor 2 Debug Library Requirements 2 Debugging Host-Side Applications that Use the Intel Offload

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ,

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ, Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - fabio.baruffa@lrz.de LRZ, 27.6.- 29.6.2016 Architecture Overview Intel Xeon Processor Intel Xeon Phi Coprocessor, 1st generation Intel Xeon

More information

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015 PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

AACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn-

AACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- AACE: Applications R. Glenn Brook Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- brook@tennessee.edu Ryan C. Hulguin Computational Science

More information

Intel Xeon Phi Coprocessor Offloading Computation

Intel Xeon Phi Coprocessor Offloading Computation Intel Xeon Phi Coprocessor Offloading Computation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC

Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC # pg. Subject Purpose directory 1 3 5 Offload, Begin (C) (F90) Compile and Run (CPU, MIC, Offload) offload_hello 2 7 Offload, Data Optimize

More information

Computer Architecture and Structured Parallel Programming James Reinders, Intel

Computer Architecture and Structured Parallel Programming James Reinders, Intel Computer Architecture and Structured Parallel Programming James Reinders, Intel Parallel Computing CIS 410/510 Department of Computer and Information Science Lecture 17 Manycore Computing and GPUs Computer

More information

Native Computing and Optimization. Hang Liu December 4 th, 2013

Native Computing and Optimization. Hang Liu December 4 th, 2013 Native Computing and Optimization Hang Liu December 4 th, 2013 Overview Why run native? What is a native application? Building a native application Running a native application Setting affinity and pinning

More information

Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth

Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth Contents Intel Visual Fortran Compiler Professional Edition for Windows*........................ 3 Features...3 New in This

More information

Sergey Maidanov. Software Engineering Manager for Intel Distribution for Python*

Sergey Maidanov. Software Engineering Manager for Intel Distribution for Python* Sergey Maidanov Software Engineering Manager for Intel Distribution for Python* Introduction Python is among the most popular programming languages Especially for prototyping But very limited use in production

More information

Intel Math Kernel Library (Intel MKL) Latest Features

Intel Math Kernel Library (Intel MKL) Latest Features Intel Math Kernel Library (Intel MKL) Latest Features Sridevi Allam Technical Consulting Engineer Sridevi.allam@intel.com 1 Agenda - Introduction to Support on Intel Xeon Phi Coprocessors - Performance

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization

More information

Xeon Phi Native Mode - Sharpen Exercise

Xeon Phi Native Mode - Sharpen Exercise Xeon Phi Native Mode - Sharpen Exercise Fiona Reid, Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents June 19, 2015 1 Aims 1 2 Introduction 1 3 Instructions 2 3.1 Log into yellowxx

More information

Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor Intel Xeon Phi Coprocessor A guide to using it on the Cray XC40 Terminology Warning: may also be referred to as MIC or KNC in what follows! What are Intel Xeon Phi Coprocessors? Hardware designed to accelerate

More information

Programming Intel R Xeon Phi TM

Programming Intel R Xeon Phi TM Programming Intel R Xeon Phi TM An Overview Anup Zope Mississippi State University 20 March 2018 Anup Zope (Mississippi State University) Programming Intel R Xeon Phi TM 20 March 2018 1 / 46 Outline 1

More information

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3

More information

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University Introduction to High Performance Computing Shaohao Chen Research Computing Services (RCS) Boston University Outline What is HPC? Why computer cluster? Basic structure of a computer cluster Computer performance

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Reusing this material

Reusing this material XEON PHI BASICS Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Performance of Multicore LUP Decomposition

Performance of Multicore LUP Decomposition Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations

More information

Introduc)on to Xeon Phi

Introduc)on to Xeon Phi Introduc)on to Xeon Phi IXPUG 14 Lars Koesterke Acknowledgements Thanks/kudos to: Sponsor: National Science Foundation NSF Grant #OCI-1134872 Stampede Award, Enabling, Enhancing, and Extending Petascale

More information

Simulation using MIC co-processor on Helios

Simulation using MIC co-processor on Helios Simulation using MIC co-processor on Helios Serhiy Mochalskyy, Roman Hatzky PRACE PATC Course: Intel MIC Programming Workshop High Level Support Team Max-Planck-Institut für Plasmaphysik Boltzmannstr.

More information

Laurent Duhem Intel Alain Dominguez - Intel

Laurent Duhem Intel Alain Dominguez - Intel Laurent Duhem Intel Alain Dominguez - Intel Agenda 2 What are Intel Xeon Phi Coprocessors? Architecture and Platform overview Intel associated software development tools Execution and Programming model

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Intel C++ Compiler Professional Edition 11.0 for Windows* In-Depth

Intel C++ Compiler Professional Edition 11.0 for Windows* In-Depth Intel C++ Compiler Professional Edition 11.0 for Windows* In-Depth Contents Intel C++ Compiler Professional Edition for Windows*..... 3 Intel C++ Compiler Professional Edition At A Glance...3 Intel C++

More information

Parallel Fast Fourier Transform implementations in Julia 12/15/2011

Parallel Fast Fourier Transform implementations in Julia 12/15/2011 Parallel Fast Fourier Transform implementations in Julia 1/15/011 Abstract This paper examines the parallel computation models of Julia through several different multiprocessor FFT implementations of 1D

More information

The Intel Xeon Phi Coprocessor. Dr-Ing. Michael Klemm Software and Services Group Intel Corporation

The Intel Xeon Phi Coprocessor. Dr-Ing. Michael Klemm Software and Services Group Intel Corporation The Intel Xeon Phi Coprocessor Dr-Ing. Michael Klemm Software and Services Group Intel Corporation (michael.klemm@intel.com) Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation S c i c o m P 2 0 1 3 T u t o r i a l Intel Xeon Phi Product Family Programming Tools Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation Agenda Intel Parallel Studio XE 2013

More information

HPC code modernization with Intel development tools

HPC code modernization with Intel development tools HPC code modernization with Intel development tools Bayncore, Ltd. Intel HPC Software Workshop Series 2016 HPC Code Modernization for Intel Xeon and Xeon Phi February 17 th 2016, Barcelona Microprocessor

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Get Ready for Intel MKL on Intel Xeon Phi Coprocessors. Zhang Zhang Technical Consulting Engineer Intel Math Kernel Library

Get Ready for Intel MKL on Intel Xeon Phi Coprocessors. Zhang Zhang Technical Consulting Engineer Intel Math Kernel Library Get Ready for Intel MKL on Intel Xeon Phi Coprocessors Zhang Zhang Technical Consulting Engineer Intel Math Kernel Library Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...

More information

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward.

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward. High Performance Parallel Programming Multicore development tools with extensions to many-core. Investment protection. Scale Forward. Enabling & Advancing Parallelism High Performance Parallel Programming

More information

Xeon Phi Native Mode - Sharpen Exercise

Xeon Phi Native Mode - Sharpen Exercise Xeon Phi Native Mode - Sharpen Exercise Fiona Reid, Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents April 30, 2015 1 Aims The aim of this exercise is to get you compiling and

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor

More information

Using the Intel Math Kernel Library (Intel MKL) and Intel Compilers to Obtain Run-to-Run Numerical Reproducible Results

Using the Intel Math Kernel Library (Intel MKL) and Intel Compilers to Obtain Run-to-Run Numerical Reproducible Results Using the Intel Math Kernel Library (Intel MKL) and Intel Compilers to Obtain Run-to-Run Numerical Reproducible Results by Todd Rosenquist, Technical Consulting Engineer, Intel Math Kernal Library and

More information