Intel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*

Similar documents
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Optimizing the operations with sparse matrices on Intel architecture

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

LED Manager for Intel NUC

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

Intel vpro Technology Virtual Seminar 2010

Intel Cache Acceleration Software for Windows* Workstation

Intel Many Integrated Core (MIC) Architecture

Installation Guide and Release Notes

INTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

How to Create a.cibd File from Mentor Xpedition for HLDRC

How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC

Intel Desktop Board DZ68DB

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Installation Guide and Release Notes

Ravindra Babu Ganapathi

Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Software Evaluation Guide for WinZip* esources-performance-documents.html

Intel vpro Technology Virtual Seminar 2010

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

IEEE1588 Frequently Asked Questions (FAQs)

Intel Core TM Processor i C Embedded Application Power Guideline Addendum

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod

Theory and Practice of the Low-Power SATA Spec DevSleep

Intel Cache Acceleration Software - Workstation

Intel Core TM i7-4702ec Processor for Communications Infrastructure

Using Intel VTune Amplifier XE for High Performance Computing

Intel MPI Library for Windows* OS

Jim Harris Principal Software Engineer Intel Data Center Group

Intel Desktop Board D945GCLF2

Bosch Rexroth* Innovates Sercos SoftMaster* for the Industrial PC Platform with the Intel Ethernet Controller I210

Intel Stereo 3D SDK Developer s Guide. Alpha Release

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003

Data Center Energy Efficiency Using Intel Intelligent Power Node Manager and Intel Data Center Manager

Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

High Performance Computing The Essential Tool for a Knowledge Economy

What s P. Thierry

Intel Desktop Board DP55SB

Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012

Drive Recovery Panel

The Intel Processor Diagnostic Tool Release Notes

Intel Desktop Board D946GZAB

Innovating and Integrating for Communications and Storage

Intel Desktop Board D945GCCR

Intel Desktop Board DG41CN

Intel Desktop Board D975XBX2

Device Firmware Update (DFU) for Windows

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

Intel Desktop Board DG31PR

CP2K Performance Benchmark and Profiling. April 2011

Introduction. How it works

Intel s Architecture for NFV

High Performance Dense Linear Algebra in Intel Math Kernel Library (Intel MKL)

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

Software Evaluation Guide for CyberLink MediaEspresso *

Intel Desktop Board DG41RQ

Intel Software Development Products Licensing & Programs Channel EMEA

Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012

Intel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design

Intel Parallel Amplifier Sample Code Guide

Intel Desktop Board D945GCLF

Installation Guide and Release Notes

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3

IXPUG 16. Dmitry Durnov, Intel MPI team

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000

Intel RealSense Depth Module D400 Series Software Calibration Tool

Intel Media Server Studio 2018 R1 - HEVC Decoder and Encoder Release Notes (Version )

2013 Intel Corporation

Using Intel Inspector XE 2011 with Fortran Applications

Intel SSD DC P3700 & P3600 Series

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Upgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence

Intel & Lustre: LUG Micah Bhakti

SDK API Reference Manual for VP8. API Version 1.12

Introduction to Intel Boot Loader Development Kit (Intel BLDK) Intel SSG/SSD/UEFI

OpenCL* Device Fission for CPU Performance

Highly accurate simulations of big-data clusters for system planning and optimization

Intel VTune Amplifier XE

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

SELINUX SUPPORT IN HFI1 AND PSM2

Intel Media Server Studio 2017 R3 Essentials Edition for Linux* Release Notes

Intel Desktop Board DH55TC

Intel vpro Technology Virtual Seminar 2010

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance

Solid-State Drive System Optimizations In Data Center Applications

Intel RealSense D400 Series Calibration Tools and API Release Notes

Using the Intel VTune Amplifier 2013 on Embedded Platforms

Intel Media Server Studio Professional Edition for Linux*

Transcription:

Intel Cluster Toolkit Compiler Edition. for Linux* or Windows HPC Server 8* Product Overview High-performance scaling to thousands of processors. Performance leadership Intel software development products and cluster tools continue to deliver leadership performance for customers, demonstrated by industry benchmarks Most comprehensive cluster toolset MPI coding assistance and MPI analysis tools support performance tuning for thousands of processors per cluster system MPI correctness checker: a confidence tool that provides substantial aid in writing robust MPI applications Highly optimized MPI library works on all processors (machine independent) Cluster support in Intel Math Kernel Library (Intel MKL) including ScaLAPACK and Cluster Discrete Fourier Transforms Best way to develop applications for cluster systems including Intel Cluster Ready systems Continued commitment to embracing innovation Compilers and libraries support Intel processors and compatible processors in a single binary Support for Intel processors, including Intel Core i7 processors -bit and 64-bit support Windows* (including Windows HPC Server 8*), Linux*

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 New Intel Cluster Toolkit Compiler Edition. for Linux or Windows HPC Server 8 includes: Intel C++ Compiler. Intel Fortran Compiler. Intel Math Kernel Library (Intel MKL). Intel MPI Library. Intel MPI Benchmarks. Intel Trace Analyzer and Collector 7. Highlights This is the most comprehensive hardware and software solution combination for MPI-based high performance computing. Increased application software performance -- Interconnect tuned -- Multicore optimized Intel MPI library Enhanced product quality, robustness, and developer productivity -- Sophisticated graphical analysis tools Reduced support load and enhanced customer satisfaction -- Mature and high-quality tools Simplify and accelerate clusters -- Recommended software development tools for Intel Cluster Ready -- No node count restrictions Features Multicore: Intel Compilers have built-in optimization technologies and multithreading support that help create code that runs best on the latest multicore processors Optimize Applications: Intel Compilers offer the breadth of advanced optimization, multithreading, and processor support that includes automatic processor dispatch, vectorization, autoparallelization, data prefetching, and loop unrolling Intel MPI Library: Automatic application-specific performance tuning, faster startup and improved collective operation algorithms for even more performance, and greater scalability over sockets and shared memory DAPL. support for less latency and multivendor interoperability The Intel Trace Analyzer and Collector: More reports, more graphics, more analysis, more filtering, and is more powerful Intel Math Kernel Library (MKL): Performance optimizations for Intel s next-generation microarchitecture family. Includes improved integration with Integrated Development Environments such as Microsoft Visual Studio*, Eclipse*, and XCode* Intel MPI Benchmarks: Extended support for Microsoft Windows HPC Server 8 and Microsoft Visual Studio 8* Intel MPI. provides an industry leading out-of-box performance due to: Incremental optimizations Best default parameters Best collective algorithms

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 Benchmarks Industry-leading MPI Performance MPI Latency Benchmarks (out-of-the-box performance) based on Intel MPI Benchmarks. (IMB.) Intel MPI, HP-MPI, ScaliMPI, MVAPICH, MVAPICH vs. OpenMPI Higher is better Performance relative to Open MPI processes on 4 nodes (InfiniBand + shmem) Geomean value on IMB. benchmarks 4.5.5.5.5 4 bytes 6 Kb 8 Kb Message Size 4 Mb Intel MPI. HP MPI..7 Scali MPI 5.6.4 MVAPICH.. MVAPICH.. Open MPI..7 Intel MPI, ScaliMPI, MVAPICH, MVAPICH vs. OpenMPI Higher is better Performance relative to Open MPI 64 processes on 8 nodes (InfiniBand + shmem) Geomean value on IMB. benchmarks.5.5.5 4 bytes 6 Kb 8 Kb Message Size 4 Mb Intel MPI. Scali MPI 5.6.4 MVAPICH.. MVAPICH.. Open MPI..7 Interconnect: InfiniBand, ConnectX adapters CPU: Xeon DP Harpertown X547 FC-LGA6.Ghz 6FSB M 64bit W (8574KL8NT) RAM: 6Gb per system Intel MPI Benchmarks. Source: Intel Corporation. Test results aggregated with overall performance scores based on geometric mean. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to www.intel.com/performance/resources/ benchmark_limitations.htm.

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 Intel MPI Library Scalability Improvements Intel MPI Library. vs. Intel MPI Library. MPI point-to-point communication Lower is better IMB Sendrecv latency, sock+shmem, 4 byte..8.6.4. Interconnect: Gigabit Ethernet; InfiniBand Platform: Intel SR56SF CPU/Stepping: Xeon X547; C step (Harpertown).8 GHz / MB L cache RAM: 6Gb per system Intel MPI. parameters: export I_MPI_DEVICE=ssm export I_MPI_NETMASK=ib (TCP/IP thru IB) IMB. 64 8 56 5 4 48 Processes (8 processes per node) Intel MPI. Intel MPI. Intel MPI Library Scalability Improvements Intel MPI Library. vs. Intel MPI Library. MPI collective communication Lower is better IMB Reduce latency, sock+shmem, 4 byte..8.6.4. Interconnect: Gigabit Ethernet; InfiniBand Platform: Intel SR56SF CPU/Stepping: Xeon X547; C step (Harpertown).8 GHz / MB L cache RAM: 6Gb per system Intel MPI. parameters: export I_MPI_DEVICE=ssm export I_MPI_NETMASK=ib (TCP/IP thru IB) IMB. 64 8 56 5 4 48 Processes (8 processes per node) Intel MPI. Intel MPI. The new Intel MPI Libary. enables a high scalability while improving the performance over Intel MPI libary. Source: Intel Corporation. Test results aggregated with overall performance scores based on geometric mean. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to www.intel.com/performance/resources/benchmark_limitations.htm. 4

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 4% % Application-specific Autotuning Benchmark (Higher is Better) Improvement autotuning over original benchmark on different workload levels (in percents) Higher is better 44 7 4 9 InfiniBand + shmem 56 processes on nodes % 96 8% 6% 4% % % 9 6 4 4 6 4 4 6 4 7 5 - - cg ep ft is lu sp A B C D Application-specific autotuning feature of Intel MPI. provides an additional performance benefit for MPI applications Interconnect: InfiniBand Platform: Intel SR56SF CPU: Xeon X547;.8 GHz / MB L cache RAM: 6Gb per system NAS Performance Benchmark. Intel MPI. NAS Performance Benchmark (NPB.) The classes A, B, C, D defines the level of workloads. Class A = small problem size (as the result low MPI communication traffic), class B = medium, C = large, and D = very large. The benefit of using MPI-tune an improvement in percents vs. Out-Of-The-Box Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to www.intel.com/performance/resources/benchmark_limitations.htm. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. 5

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 µs Performance Gain through DAPL. support Intel MPI Library. - DAPL. Smaller is better Latency at 4 bytes All benchmarks InfiniBand 4 processes on 4 nodes 6.6 Timings [usec] 5 5 8 5 6 8.4 6.4. 7.7. 9.9 9.6 4.7 8.59 4.4.6 4.9 6.87.6 7...55 7.74 8.8.54.. PingPong PingPing Sendrecv Exchange Reduce Allreduce Reduce_ scatter Allgather Allgatherv Alltoall Alltoallv Bcast Barrier Ordinary wait Wait mode with DAPL. On small package sizes, MPI communication through DAPL. provides an additional performance gain in rdma wait mode Interconnect: InfiniBand; HCA: Mellanox MT58 CPU: Woodcrest,.66 GHz / 4 MB L cache RAM: 8Gb per system IMB. Intel MPI. Source: Intel Corporation. Test results aggregated with overall performance scores based on geometric mean. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to www.intel.com/performance/resources/benchmark_limitations.htm. Testimonials: Intel Cluster Tools At LSTC we know how difficult MPI programming can be and invest considerable effort into making LS-Dyna robust. Message checking with Intel Trace Analyzer and Collector identified a very subtle issue before it became a problem, saving us a significant amount of potential future debugging. No other tool of which I am aware has this capability or could have detected this problem. LSTC/LS-Dyna, Brian Wainscott, Developer EXASOL was able to analyze the runtime behavior very efficiently by using the Intel Cluster Tools. As a result, some parts of the application performance were improved considerably. In addition, EXASOL estimates that the development time and development efficiency has improved up to % using these tools. EXASOL, Business Intelligence Applications, Mathias Golombek, Principal Manager R&D 6

Get advanced performance and optimization with Intel Cluster Toolkit Compiler Edition. http://intel.com/software/products INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WAR- RANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling -8-548-475, or by visiting Intel s Web site at www.intel.com. 8, Intel Corporation. All rights reserved. Intel, the Intel logo, and Intel Core are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Printed in USA XXXX/XXX/XXX/XX/XX Please Recycle XXXXXX-US