Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation
|
|
- Lee Clarke
- 6 years ago
- Views:
Transcription
1 Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation Alexander Kalinkin Anton Anders Roman Anders 1
2 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vpro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vpro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright Intel Corporation. 2
3 Agenda Problem statement Algorithm Experiments Conclusion 3
4 Problem statement Cons No extra info available for the matrix, only some generic properties (positive definite, Hermitian, ) Huge size Pros Clusters with modern Intel CPUs Intel Math Kernel Library (Intel MKL) with optimized BLAS, LAPACK, PARDISO functionality 4
5 Algorithm (Ax=b) Input: matrix A, vector b; special parameters. Matrix reordering and symbolic factorization Reorder matrix A to reduce fill-in in factor L, create dependency tree representation of matrix A Numeric factorization Compute decomposition A=LL T or LDL T or LU The most time-consuming part Forward and backward substitution Solve Ly=b (forward step), Dz=y (diagonal step), then L T x=z (backward step) Output: vector x. 5
6 Factorization step Matrix A after reordering (example of 4 leafs/processes) E B C D E F G A B C D E F G - non-zero block - L-block updates R-block (or Right depends on Left) Tree representation of matrix A after reordering C G F A B D E Both tree and tree-node parallelization are used All computations within the node are based on functionality from Intel MKL Computation of leafs & updates of a block are independent on each process Data is distributed between processes uniformly 6
7 Factorization step Matrix A after reordering (example of 4 leafs/process) E B C D E F G A B C D E F G - non-zero block - L-block updates R-block (or Right depends on Left) Tree representation of matrix A after reordering C G F A B D E Both tree and tree-node parallelization are used All computations within the node are based on functionality from Intel MKL Computation of leafs & updates of a block are independent on each process Data is distributed between processes uniformly 7
8 Implementation of LU decomposition within a node G Selecting one thread per process allows us to mask data transfers behind computations 8
9 Current status/interface Supported as 2 additional libraries for Linux* & Microsoft Windows*, 64-bit only. Supports different MPI implementations via user-compiled wrapper. C: {. PARDISO (pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, b, x, &error); } {. comm = MPI_Comm_c2f(MPI_COMM_WORLD); CPARDISO (pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, b, x, comm, &error); } Fortran:. call PARDISO(pt, maxfct, mnum, mtype, phase, n, a, ia, ja, idum, nrhs, iparm, msglvl, b, x, error) call CPARDISO(pt, maxfct, mnum, mtype, phase, n, a, ia, ja, idum, nrhs, iparm, msglvl, b, x, comm, &error) 9
10 Experiments (scalability of time) ** ** Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to Refer to our Optimization Notice for more information regarding performance and optimization choices in Intel software products at: **Here and further: The University of Florida Sparse Matrix Collection T. A. Davis and Y. Hu, ACM Transactions on Mathematical Software, Vol 38, Issue 1, 2011, pp 1:1-1:
11 Experiments (scalability of time) Additional processes reduce computational time!!! ** ** Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to Refer to our Optimization Notice for more information regarding performance and optimization choices in Intel software products at: **Here and further: The University of Florida Sparse Matrix Collection T. A. Davis and Y. Hu, ACM Transactions on Mathematical Software, Vol 38, Issue 1, 2011, pp 1:1-1:
12 time, sec Experiments (scalability of time) ** 3Dspectralwave, material problem G solve fact reorder C F number of nodes, each node use 16 threads A B D E Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to Refer to our Optimization Notice for more information regarding performance and optimization choices in Intel software products at: Factorization and solving steps scale well in terms of memory and performance. Parallelization of reordering step might lead to worse reordering affecting overall time Deeper investigation is needed here. 12
13 Ratio of time Ratio of time Experiments (balancing) 1,3 ga41as41h72 ** G ,2 1,1 1 0,9 0, number of mpi processes (2 threads per process) Long_coup_dt6 ** approach 1 approach 2 approach 3 approach 4 A 0 C B D F E 1,4 1,3 1,2 1,1 1 0,9 0, number of mpi processes (2 threads per process) approach 1 approach 2 approach 3 approach 4 In case of non-uniform tree, there are a few approaches to divide nodes of the tree between computational nodes. But there is no best approach, so to achieve good performance we switch between them at reordering step. 13
14 Max memory per node, Gb Max memory per node, Gb Experiments (scalability of memory) NDOF=398K, NNZ=15.7M Absolute memory per node scalability (Lower is better) Number of MPI processes (1 per HW node) NDOF=1.7M, NNZ=12M Absolute memory per node scalability (Lower is better) Number of MPI processes (1 per HW node) Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to Refer to our Optimization Notice for more information regarding performance and optimization choices in Intel software products at: 14
15 Max memory per node, Gb Max memory per node, Gb Experiments (scalability of memory) Additional processes decrease memory size per host!!! NDOF=398K, NNZ=15.7M Absolute memory per node scalability (Lower is better) Number of MPI processes (1 per HW node) NDOF=1.7M, NNZ=12M Absolute memory per node scalability (Lower is better) Number of MPI processes (1 per HW node) Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to Refer to our Optimization Notice for more information regarding performance and optimization choices in Intel software products at: 15
16 Conclusion Intel Direct Sparse Solver for Clusters based on Intel MKL functionality results in Good scaling of computational time Good scaling of memory per node A lot of work/investigation still needs to be completed 16
17 Automatic Offload support in Intel Math Kernel Library for Intel Xeon Phi coprocessors Alexander Kalinkin Nikita Shustrov 17
18 Algorithm The matrix factorization method is based on panel factorization approach. It shows advantages over communication avoiding and tail methods or theirs combinations: no additional computational cost no additional memory consumption The panel factorization approach has the same computational cost and memory usage as classic LAPACK algorithms The implementation preserves the LAPACK standard interfaces and the algorithm can be applied for any matrix The implementation is DAG-based and uses panel factorization kernels that were redesigned and rewritten for Intel Xeon Phi coprocessors
19 Algorithm: Features Adaptable data/task distribution on-the-fly between CPUs and coprocessors with good load balancing Efficient utilization of all available computational units in heterogeneous systems Support of heterogeneous systems with big number of coprocessors Scalability. A host with 1 coprocessor can show >2x performance improvement and a host + 2 coprocessors can show >3x speed-up No limitations on matrix sizes The algorithm provides a high degree of parallelism while minimizing synchronizations and communications.
20 Performance results The algorithm is implemented in the frame of Intel MKL LU, QR and Cholesky factorization routines. The implemented routines detect the presence of Intel Xeon Phi coprocessors and automatically offload the computations that benefit from additional computational resources. The usage model ensures ease of use by hiding the complexity of heterogeneous systems from the user and providing the same API as standard Intel MKL routines. User no need to change code for application
21 Intel Math Kernel Library Extended Eigensolver for Solving Symmetric Eigenvalue Problems Alexander Kalinkin Sergey Kuznetsov 21
22 Intel MKL Extended Eigensolver Functionality Extended Eigensolver computes all the eigenvalues within a given search interval λ min, λ max and respective eigenvectors. Standard Eigenvalue Problem Ax x, A A Generalized Eigenvalue Problem Ax Bx A Ais real) Supported matrix storages: sparse, banded, and dense. Similar LAPACK routine are named as: *evx and *gvx. * * A, B B * ( A 0 The Feast algorithm is inspired by the contour integration technique in quantum mechanics. The FEAST algorithm deviates fundamentally from the traditional techniques based on Krylov subspace iteration (Arnoldi and Lanczos algorithms) or other Davidson-Jacobi techniques. A t if
23 Extended Eigensolver Interfaces Reverse Communications Interfaces Users provide their own complex-precision solver of linear systems and their own matrix-matrix multiply. More flexible for specific applications Interfaces for Predefined Formats/Solvers Easy (plug-and-play) Intended for the following matrix formats: sparse (CSR format, 1-based only supported) banded (LAPACK storage) dense (LAPACK storage) More than 90% of computations are done by PARDISO (as sparse solver) Spike (as banded solver) LAPACK (as dense matrices solver) All data types are supported (real, real*8, complex, complex*16).
24 Speedup Scalability of Intel MKL Extended Eigensolver 12 OpenMP* scalability of dfeast_scsrev on Intel Xeon CPU E (2.7 Ghz, RAM 32 GB) Speedup on 2 threads Speedup on 4 threads Speedup on 8 threads Speedup on 16 threads Matrix size Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to Refer to our Optimization Notice for more information regarding performance and optimization choices in Intel software products at: 7/5/
25 Intel MKL Sparse BLAS: performance optimizations on modern architectures Alexander Kalinkin Sergey Pudov 25
26 Intel MKL Sparse BLAS: Introduction Intel MKL Sparse BLAS supports 6 sparse formats: CSR, CSC, BSR, DIA, COO, and SKY. It is primarily designed for applications where the computations are completed a few times only. Every function calculates the result in a single call, which includes simple matrix analysis and execution steps. Deep investigation of the sparse matrix pattern is not performed because it is a time consuming operation that affects the performance. 26
27 Two-step Computations Approach It is known that for best performance, computational kernels and the workload balancing algorithm should depend on the structure of the matrix. When multiple calls are expected with a particular sparse matrix pattern, it is better to organize computations in two steps: Analysis, which chooses the best kernel and workload balancing algorithm for a given computer architecture. Execution, where the information from the previous step is used to get high performance. We can try to use this approach since the time required for a single analysis step can be less than the overall performance benefit from multiple execution steps. Limitations imposed by the single-call Sparse BLAS interfaces are mostly visible on modern architectures with multiple cores where even small workload imbalance may result in a significant performance deficiency. 27
28 Experimental Library Experimental library for an Intel Xeon Phi coprocessor contains some SpMV functionality with two-step interface to investigate performance benefits of this approach. The library supports: Two formats: CSR and ESB A couple of workload balancing algorithms ** The goal of the experiment is to collect early feedback. The library will be available via request to intel.mkl@intel.com after release. 28
29 Q & A 29
30 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #
31 31
Intel MKL Sparse Solvers. Software Solutions Group - Developer Products Division
Intel MKL Sparse Solvers - Agenda Overview Direct Solvers Introduction PARDISO: main features PARDISO: advanced functionality DSS Performance data Iterative Solvers Performance Data Reference Copyright
More informationIntel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) Sparse Solvers Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager Copyright 3, Intel Corporation. All rights reserved. Sparse
More informationOverview of Intel MKL Sparse BLAS. Software and Services Group Intel Corporation
Overview of Intel MKL Sparse BLAS Software and Services Group Intel Corporation Agenda Why and when sparse routines should be used instead of dense ones? Intel MKL Sparse BLAS functionality Sparse Matrix
More informationGAP Guided Auto Parallelism A Tool Providing Vectorization Guidance
GAP Guided Auto Parallelism A Tool Providing Vectorization Guidance 7/27/12 1 GAP Guided Automatic Parallelism Key design ideas: Use compiler to help detect what is blocking optimizations in particular
More informationParallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012
Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2
More informationIntel MKL Data Fitting component. Overview
Intel MKL Data Fitting component. Overview Intel Corporation 1 Agenda 1D interpolation problem statement Functional decomposition of the problem Application areas Data Fitting in Intel MKL Data Fitting
More informationC Language Constructs for Parallel Programming
C Language Constructs for Parallel Programming Robert Geva 5/17/13 1 Cilk Plus Parallel tasks Easy to learn: 3 keywords Tasks, not threads Load balancing Hyper Objects Array notations Elemental Functions
More informationWhat's new in VTune Amplifier XE
What's new in VTune Amplifier XE Naftaly Shalev Software and Services Group Developer Products Division 1 Agenda What s New? Using VTune Amplifier XE 2013 on Xeon Phi coprocessors New and Experimental
More informationIntel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides
More informationEnabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors
Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors Application Note May 2008 Order Number: 319801; Revision: 001US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH
More informationTechniques for Lowering Power Consumption in Design Utilizing the Intel EP80579 Integrated Processor Product Line
Techniques for Lowering Power Consumption in Design Utilizing the Intel Integrated Processor Product Line Order Number: 320180-003US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED
More informationGetting Compiler Advice from the Optimization Reports
Getting Compiler Advice from the Optimization Reports Getting Started Guide An optimizing compiler can do a lot better with just a few tips from you. We've integrated the Intel compilers with Intel VTune
More informationSoftware Tools for Software Developers and Programming Models
Software Tools for Software Developers and Programming Models James Reinders Director, Evangelist, Intel Software james.r.reinders@intel.com 1 Our Goals for Software Tools and Models 2 Our Goals for Software
More informationIntel Parallel Amplifier Sample Code Guide
The analyzes the performance of your application and provides information on the performance bottlenecks in your code. It enables you to focus your tuning efforts on the most critical sections of your
More informationПовышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин
Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS
More informationHPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF
HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF 1 Outline KNL results Our other work related to HPCG 2 ~47 GF/s per KNL ~10
More informationUsing Intel Inspector XE 2011 with Fortran Applications
Using Intel Inspector XE 2011 with Fortran Applications Jackson Marusarz Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationIntel IT Director 1.7 Release Notes
Intel IT Director 1.7 Release Notes Document Number: 320156-005US Contents What s New Overview System Requirements Installation Notes Documentation Known Limitations Technical Support Disclaimer and Legal
More informationOptimizing the operations with sparse matrices on Intel architecture
Optimizing the operations with sparse matrices on Intel architecture Gladkikh V. S. victor.s.gladkikh@intel.com Intel Xeon, Intel Itanium are trademarks of Intel Corporation in the U.S. and other countries.
More informationOpen FCoE for ESX*-based Intel Ethernet Server X520 Family Adapters
Open FCoE for ESX*-based Intel Ethernet Server X520 Family Adapters Technical Brief v1.0 August 2011 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.
More informationHow to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on SuSE*Enterprise Linux Server* using Xen*
How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on SuSE*Enterprise Linux Server* using Xen* Technical Brief v1.0 September 2011 Legal Lines and Disclaimers INFORMATION IN THIS
More informationIntel MPI Library for Windows* OS
Intel MPI Library for Windows* OS Getting Started Guide The Intel MPI Library is a multi-fabric message passing library that implements the Message Passing Interface, v2 (MPI-2) specification. Use it to
More informationOverview of Intel Parallel Studio XE
Overview of Intel Parallel Studio XE Stephen Blair-Chappell 1 30-second pitch Intel Parallel Studio XE 2011 Advanced Application Performance What Is It? Suite of tools to develop high performing, robust
More informationIntel C++ Compiler Documentation
Document number: 304967-001US Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
More informationUsing the Intel VTune Amplifier 2013 on Embedded Platforms
Using the Intel VTune Amplifier 2013 on Embedded Platforms Introduction This guide explains the usage of the Intel VTune Amplifier for performance and power analysis on embedded devices. Overview VTune
More informationIntel(R) Threading Building Blocks
Getting Started Guide Intel Threading Building Blocks is a runtime-based parallel programming model for C++ code that uses threads. It consists of a template-based runtime library to help you harness the
More informationVectorization Advisor: getting started
Vectorization Advisor: getting started Before you analyze Run GUI or Command Line Set-up environment Linux: source /advixe-vars.sh Windows: \advixe-vars.bat Run GUI or Command
More informationInstallation Guide and Release Notes
Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel
More informationIntel Performance Libraries
Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation
More informationPARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures
PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures Solovev S. A, Pudov S.G sergey.a.solovev@intel.com, sergey.g.pudov@intel.com Intel Xeon, Intel Core 2 Duo are trademarks of
More informationAgenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP
More informationPARDISO Version Reference Sheet Fortran
PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly
More informationIntel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor
Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS
More informationProduct Change Notification
Product Change Notification 111213-02 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property
More informationH.J. Lu, Sunil K Pandey. Intel. November, 2018
H.J. Lu, Sunil K Pandey Intel November, 2018 Issues with Run-time Library on IA Memory, string and math functions in today s glibc are optimized for today s Intel processors: AVX/AVX2/AVX512 FMA It takes
More informationProduct Change Notification
Product Change Notification Change Notification #: 114547-01 Change Title: Intel Dual Band Wireless-AC 3165 SKUs: 3165.NGWG.I; 3165.NGWGA.I; 3165.NGWG.S; 3165.NGWG; 3165.NGWGA.S; 3165.NGWGA, PCN 114547-01,
More informationSergey Maidanov. Software Engineering Manager for Intel Distribution for Python*
Sergey Maidanov Software Engineering Manager for Intel Distribution for Python* Introduction Python is among the most popular programming languages Especially for prototyping But very limited use in production
More informationProduct Change Notification
Product Change Notification Change Notification #: 114137-00 Change Title: Intel Dual Band Wireless-AC 8260, Intel Dual Band Wireless-N 8260, SKUs: 8260.NGWMG.NVS, 8260.NGWMG.S, 8260.NGWMG, 8260.NGWMG.NV
More informationProduct Change Notification
Product Change Notification Change Notification #: 114332-00 Change Title: Intel Dual Band Wireless-AC 7260, Intel Dual Band Wireless-N 7260, Intel Wireless-N 7260, SKUs: 7260.NGIANG, 7260.NGIG, 7260.NGINBG,
More informationOpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel
OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel Clang * : An Excellent C++ Compiler LLVM * : Collection of modular and reusable compiler and toolchain technologies Created by Chris Lattner
More informationBeyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB
Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB Jim Cownie Intel SSG/DPD/TCAR 1 Optimization Notice Optimization Notice Intel s compilers may or
More informationIntel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Boot-Up Options
Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Boot-Up Options Application Note September 2004 Document Number: 254067-002 Contents INFORMATION IN THIS DOCUMENT IS
More informationProduct Change Notification
Product Change Notification Change Notification #: 115338-00 Change Title: Intel Dual Band Wireless-AC 7265 and Intel Dual Band Wireless-N 7265 SKUs: 7265.NGWANG.W; 7265.NGWG.NVBR; 7265.NGWG.NVW; 7265.NGWG.W;
More informationProduct Change Notification
Product Change Notification Change Notification #: 115169-01 Change Title: Intel Dual Band Wireless-AC 8265 SKUs: 8265.D2WMLG; 8265.D2WMLG.NV; 8265.D2WMLG.NVH; 8265.D2WMLGH; 8265.D2WMLG.NVS; 8265.D2WMLG.S;
More informationIXPUG 16. Dmitry Durnov, Intel MPI team
IXPUG 16 Dmitry Durnov, Intel MPI team Agenda - Intel MPI 2017 Beta U1 product availability - New features overview - Competitive results - Useful links - Q/A 2 Intel MPI 2017 Beta U1 is available! Key
More informationIntel(R) Threading Building Blocks
Getting Started Guide Intel Threading Building Blocks is a runtime-based parallel programming model for C++ code that uses threads. It consists of a template-based runtime library to help you harness the
More informationProduct Change Notification
Product Change Notification 110813-00 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property
More informationProduct Change Notification
Product Change Notification 110606-00 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property
More informationAchieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013
Achieving High Performance Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013 Does Instruction Set Matter? We find that ARM and x86 processors are simply engineering design points optimized
More informationInstallation Guide and Release Notes
Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel
More informationProduct Change Notification
Product Change Notification 110867-00 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property
More informationCilk Plus in GCC. GNU Tools Cauldron Balaji V. Iyer Robert Geva and Pablo Halpern Intel Corporation
Cilk Plus in GCC GNU Tools Cauldron 2012 Balaji V. Iyer Robert Geva and Pablo Halpern Intel Corporation July 10, 2012 Presentation Outline Introduction Cilk Plus components Implementation GCC Project Status
More informationIntel Platform Controller Hub EG20T
Intel Platform Controller Hub EG20T Packet HUB Driver for Windows* Programmer s Guide February 2011 Order Number: 324265-002US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationIntel Software Development Products Licensing & Programs Channel EMEA
Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Advanced Performance Distributed Performance Intel Software Development Products Foundation of
More informationECC Handling Issues on Intel XScale I/O Processors
ECC Handling Issues on Intel XScale I/O Processors Technical Note December 2003 Order Number: 300311-001 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationIntel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes
Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...
More informationProduct Change Notification
Product Change Notification 110880-00 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property
More informationSarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018
Sarah Knepper Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018 Outline Motivation Problem statement and solutions Simple example Performance comparison 2 Motivation Partial differential equations
More informationIntel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant
Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor
More informationIntroduction to Intel Fortran Compiler Documentation. Document Number: US
Introduction to Intel Fortran Compiler Documentation Document Number: 307778-003US Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationProduct Change Notification
Product Change Notification 112087-00 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY
More informationLIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015
LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015 Abstract Library for small matrix-matrix multiplications targeting
More informationINTEL MKL Vectorized Compact routines
INTEL MKL Vectorized Compact routines Mesut Meterelliyoz, Peter Caday, Timothy B. Costa, Kazushige Goto, Louise Huot, Sarah Knepper, Arthur Araujo Mitrano, Shane Story 2018 BLIS RETREAT 09/17/2018 OUTLINE
More informationProduct Change Notification
Product Change Notification Change Notification #: 114216-00 Change Title: Intel SSD 730 Series (240GB, 480GB, 2.5in SATA 6Gb/s, 20nm, MLC) 7mm, Generic Single Pack, Intel SSD 730 Series (240GB, 480GB,
More informationIntel Math Kernel Library (Intel MKL) Latest Features
Intel Math Kernel Library (Intel MKL) Latest Features Sridevi Allam Technical Consulting Engineer Sridevi.allam@intel.com 1 Agenda - Introduction to Support on Intel Xeon Phi Coprocessors - Performance
More informationProduct Change Notification
Product Change Notification Change Notification #: 115007-00 Change Title: Select Intel SSD 530 Series, Intel SSD 535 Series, Intel SSD E 5410s Series, Intel SSD E 5420s Series, Intel SSD PRO 2500 Series,
More informationBei Wang, Dmitry Prohorov and Carlos Rosales
Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512
More informationIntel Platform Controller Hub EG20T
Intel Platform Controller Hub EG20T UART Controller Driver for Windows* Programmer s Guide Order Number: 324261-002US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationAlexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria
Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media
More informationProduct Change Notification
Product Notification Notification #: 114712-01 Title: Intel SSD 750 Series, Intel SSD DC P3500 Series, Intel SSD DC P3600 Series, Intel SSD DC P3608 Series, Intel SSD DC P3700 Series, PCN 114712-01, Product
More informationProduct Change Notification
Product Change Notification 113412-00 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY
More informationHPCG Results on IA: What does it tell about architecture?
HPCG Results on IA: What does it tell about architecture? Jongsoo Park *, Mikhail Smelyanskiy *, Alexander Heinecke *, Vadim Pirogov *, Scott David *, Carlos Rosales-Fernandez #, Christopher Daley $, Yutong
More informationIntel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes
Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 Intel Debugger (IDB) is
More informationVTune(TM) Performance Analyzer for Linux
VTune(TM) Performance Analyzer for Linux Getting Started Guide The VTune Performance Analyzer provides information on the performance of your code. The VTune analyzer shows you the performance issues,
More informationContinuous Speech Processing API for Host Media Processing
Continuous Speech Processing API for Host Media Processing Demo Guide April 2005 05-2084-003 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
More informationProduct Change Notification
Product Change Notification 111962-00 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY
More informationProduct Change Notification
Product Change Notification Change Notification #: 114258-00 Change Title: Intel SSD DC S3710 Series (200GB, 400GB, 800GB, 1.2TB, 2.5in SATA 6Gb/s, 20nm, MLC) 7mm, Generic 50 Pack Intel SSD DC S3710 Series
More informationProduct Change Notification
Product Change Notification 113028-02 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY
More informationParallel Programming Models
Parallel Programming Models Intel Cilk Plus Tasking Intel Threading Building Blocks, Copyright 2009, Intel Corporation. All rights reserved. Copyright 2015, 2011, Intel Corporation. All rights reserved.
More informationHigh Performance Computing The Essential Tool for a Knowledge Economy
High Performance Computing The Essential Tool for a Knowledge Economy Rajeeb Hazra Vice President & General Manager Technical Computing Group Datacenter & Connected Systems Group July 22 nd 2013 1 What
More informationIntel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python
Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Python Landscape Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#1:
More informationProduct Change Notification
Product Change Notification Change Notification #: 115107-00 Change Title: Intel Ethernet Converged Network Adapter X520 - DA1, E10G41BTDAPG1P5,, MM#927066, Intel Ethernet Converged Network Adapter X520
More informationThird Party Hardware TDM Bus Administration
Third Party Hardware TDM Bus Administration for Windows Copyright 2003 Intel Corporation 05-1509-004 COPYRIGHT NOTICE INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationMayLoon User Manual. Copyright 2013 Intel Corporation. Document Number: xxxxxx-xxxus. World Wide Web:
Copyright 2013 Intel Corporation Document Number: xxxxxx-xxxus World Wide Web: http://www.intel.com/software Document Number: XXXXX-XXXXX Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS
More informationProduct Change Notification
Product Change Notification 110988-01 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property
More informationGraphics Performance Analyzer for Android
Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent
More informationSample for OpenCL* and DirectX* Video Acceleration Surface Sharing
Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document
More informationProduct Change Notification
Product Change Notification 112177-01 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY
More informationIntel EP80579 Software Drivers for Embedded Applications
Intel EP80579 Software Drivers for Embedded Applications Package Version 1.0 Release Notes October 2008 Order Number: 320150-005US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN
More informationFastest and most used math library for Intel -based systems 1
Fastest and most used math library for Intel -based systems 1 Speaker: Alexander Kalinkin Contributing authors: Peter Caday, Kazushige Goto, Louise Huot, Sarah Knepper, Mesut Meterelliyoz, Arthur Araujo
More informationIFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor
IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization
More informationReal World Development examples of systems / iot
Real World Development examples of systems / iot Intel Software Developer Conference Seoul 2017 Jon Kim Software Consulting Engineer Contents IOT end-to-end Scalability with Intel x86 Architect Real World
More informationIntel IXP400 Software: Integrating STMicroelectronics* ADSL MTK20170* Chipset Firmware
Intel IXP400 Software: Integrating STMicroelectronics* ADSL MTK20170* Chipset Firmware Application Note September 2004 Document Number: 254065-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationProduct Change Notification
Product Change Notification Change Notification #: 114840-00 Change Title: Intel Omni-Path Host Fabric Interface Adapter 100 Series 1 Port PCIe x16 Standard 100HFA016FS, Intel Omni-Path Host Fabric Interface
More informationOverview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation
Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation Agenda 1D interpolation problem statement Computation flow Application areas Data fitting in Intel MKL Data
More informationIntel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor PCI 16-Bit Read Implementation
Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor PCI 16-Bit Read Implementation Application Note September 2004 Document Number: 300375-002 INFORMATION IN THIS DOCUMENT
More informationBecca Paren Cluster Systems Engineer Software and Services Group. May 2017
Becca Paren Cluster Systems Engineer Software and Services Group May 2017 Clusters are complex systems! Challenge is to reduce this complexity barrier for: Cluster architects System administrators Application
More informationEnabling Hardware Accelerated Playback for Intel Atom /Intel US15W Platform and IEGD
White Paper Ishu Verma Software Technical Marketing Engineer Intel Corporation Enabling Hardware Accelerated Playback for Intel Atom /Intel US15W Platform and IEGD Case Study Using MPlayer on Moblin March,
More informationMikhail Dvorskiy, Jim Cownie, Alexey Kukanov
Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov What is the Parallel STL? C++17 C++ Next An extension of the C++ Standard Template Library algorithms with the execution policy argument Support for parallel
More informationUsing Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System
Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System Overview This guide is intended to help developers use the latest version of Intel Math Kernel Library (Intel
More information