Intel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*

Intel Cluster Toolkit Compiler Edition. for Linux* or Windows HPC Server 8* Product Overview High-performance scaling to thousands of processors. Performance leadership Intel software development products and cluster tools continue to deliver leadership performance for customers, demonstrated by industry benchmarks Most comprehensive cluster toolset MPI coding assistance and MPI analysis tools support performance tuning for thousands of processors per cluster system MPI correctness checker: a confidence tool that provides substantial aid in writing robust MPI applications Highly optimized MPI library works on all processors (machine independent) Cluster support in Intel Math Kernel Library (Intel MKL) including ScaLAPACK and Cluster Discrete Fourier Transforms Best way to develop applications for cluster systems including Intel Cluster Ready systems Continued commitment to embracing innovation Compilers and libraries support Intel processors and compatible processors in a single binary Support for Intel processors, including Intel Core i7 processors -bit and 64-bit support Windows* (including Windows HPC Server 8*), Linux*

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 New Intel Cluster Toolkit Compiler Edition. for Linux or Windows HPC Server 8 includes: Intel C++ Compiler. Intel Fortran Compiler. Intel Math Kernel Library (Intel MKL). Intel MPI Library. Intel MPI Benchmarks. Intel Trace Analyzer and Collector 7. Highlights This is the most comprehensive hardware and software solution combination for MPI-based high performance computing. Increased application software performance -- Interconnect tuned -- Multicore optimized Intel MPI library Enhanced product quality, robustness, and developer productivity -- Sophisticated graphical analysis tools Reduced support load and enhanced customer satisfaction -- Mature and high-quality tools Simplify and accelerate clusters -- Recommended software development tools for Intel Cluster Ready -- No node count restrictions Features Multicore: Intel Compilers have built-in optimization technologies and multithreading support that help create code that runs best on the latest multicore processors Optimize Applications: Intel Compilers offer the breadth of advanced optimization, multithreading, and processor support that includes automatic processor dispatch, vectorization, autoparallelization, data prefetching, and loop unrolling Intel MPI Library: Automatic application-specific performance tuning, faster startup and improved collective operation algorithms for even more performance, and greater scalability over sockets and shared memory DAPL. support for less latency and multivendor interoperability The Intel Trace Analyzer and Collector: More reports, more graphics, more analysis, more filtering, and is more powerful Intel Math Kernel Library (MKL): Performance optimizations for Intel s next-generation microarchitecture family. Includes improved integration with Integrated Development Environments such as Microsoft Visual Studio*, Eclipse*, and XCode* Intel MPI Benchmarks: Extended support for Microsoft Windows HPC Server 8 and Microsoft Visual Studio 8* Intel MPI. provides an industry leading out-of-box performance due to: Incremental optimizations Best default parameters Best collective algorithms

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 Benchmarks Industry-leading MPI Performance MPI Latency Benchmarks (out-of-the-box performance) based on Intel MPI Benchmarks. (IMB.) Intel MPI, HP-MPI, ScaliMPI, MVAPICH, MVAPICH vs. OpenMPI Higher is better Performance relative to Open MPI processes on 4 nodes (InfiniBand + shmem) Geomean value on IMB. benchmarks 4.5.5.5.5 4 bytes 6 Kb 8 Kb Message Size 4 Mb Intel MPI. HP MPI..7 Scali MPI 5.6.4 MVAPICH.. MVAPICH.. Open MPI..7 Intel MPI, ScaliMPI, MVAPICH, MVAPICH vs. OpenMPI Higher is better Performance relative to Open MPI 64 processes on 8 nodes (InfiniBand + shmem) Geomean value on IMB. benchmarks.5.5.5 4 bytes 6 Kb 8 Kb Message Size 4 Mb Intel MPI. Scali MPI 5.6.4 MVAPICH.. MVAPICH.. Open MPI..7 Interconnect: InfiniBand, ConnectX adapters CPU: Xeon DP Harpertown X547 FC-LGA6.Ghz 6FSB M 64bit W (8574KL8NT) RAM: 6Gb per system Intel MPI Benchmarks. Source: Intel Corporation. Test results aggregated with overall performance scores based on geometric mean. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to www.intel.com/performance/resources/ benchmark_limitations.htm.

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 Intel MPI Library Scalability Improvements Intel MPI Library. vs. Intel MPI Library. MPI point-to-point communication Lower is better IMB Sendrecv latency, sock+shmem, 4 byte..8.6.4. Interconnect: Gigabit Ethernet; InfiniBand Platform: Intel SR56SF CPU/Stepping: Xeon X547; C step (Harpertown).8 GHz / MB L cache RAM: 6Gb per system Intel MPI. parameters: export I_MPI_DEVICE=ssm export I_MPI_NETMASK=ib (TCP/IP thru IB) IMB. 64 8 56 5 4 48 Processes (8 processes per node) Intel MPI. Intel MPI. Intel MPI Library Scalability Improvements Intel MPI Library. vs. Intel MPI Library. MPI collective communication Lower is better IMB Reduce latency, sock+shmem, 4 byte..8.6.4. Interconnect: Gigabit Ethernet; InfiniBand Platform: Intel SR56SF CPU/Stepping: Xeon X547; C step (Harpertown).8 GHz / MB L cache RAM: 6Gb per system Intel MPI. parameters: export I_MPI_DEVICE=ssm export I_MPI_NETMASK=ib (TCP/IP thru IB) IMB. 64 8 56 5 4 48 Processes (8 processes per node) Intel MPI. Intel MPI. The new Intel MPI Libary. enables a high scalability while improving the performance over Intel MPI libary. Source: Intel Corporation. Test results aggregated with overall performance scores based on geometric mean. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to www.intel.com/performance/resources/benchmark_limitations.htm. 4

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 4% % Application-specific Autotuning Benchmark (Higher is Better) Improvement autotuning over original benchmark on different workload levels (in percents) Higher is better 44 7 4 9 InfiniBand + shmem 56 processes on nodes % 96 8% 6% 4% % % 9 6 4 4 6 4 4 6 4 7 5 - - cg ep ft is lu sp A B C D Application-specific autotuning feature of Intel MPI. provides an additional performance benefit for MPI applications Interconnect: InfiniBand Platform: Intel SR56SF CPU: Xeon X547;.8 GHz / MB L cache RAM: 6Gb per system NAS Performance Benchmark. Intel MPI. NAS Performance Benchmark (NPB.) The classes A, B, C, D defines the level of workloads. Class A = small problem size (as the result low MPI communication traffic), class B = medium, C = large, and D = very large. The benefit of using MPI-tune an improvement in percents vs. Out-Of-The-Box Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to www.intel.com/performance/resources/benchmark_limitations.htm. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. 5

Product Overview: Intel Cluster Toolkit Compiler Edition. For Linux* or Windows* HPC Server 8 µs Performance Gain through DAPL. support Intel MPI Library. - DAPL. Smaller is better Latency at 4 bytes All benchmarks InfiniBand 4 processes on 4 nodes 6.6 Timings [usec] 5 5 8 5 6 8.4 6.4. 7.7. 9.9 9.6 4.7 8.59 4.4.6 4.9 6.87.6 7...55 7.74 8.8.54.. PingPong PingPing Sendrecv Exchange Reduce Allreduce Reduce_ scatter Allgather Allgatherv Alltoall Alltoallv Bcast Barrier Ordinary wait Wait mode with DAPL. On small package sizes, MPI communication through DAPL. provides an additional performance gain in rdma wait mode Interconnect: InfiniBand; HCA: Mellanox MT58 CPU: Woodcrest,.66 GHz / 4 MB L cache RAM: 8Gb per system IMB. Intel MPI. Source: Intel Corporation. Test results aggregated with overall performance scores based on geometric mean. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to www.intel.com/performance/resources/benchmark_limitations.htm. Testimonials: Intel Cluster Tools At LSTC we know how difficult MPI programming can be and invest considerable effort into making LS-Dyna robust. Message checking with Intel Trace Analyzer and Collector identified a very subtle issue before it became a problem, saving us a significant amount of potential future debugging. No other tool of which I am aware has this capability or could have detected this problem. LSTC/LS-Dyna, Brian Wainscott, Developer EXASOL was able to analyze the runtime behavior very efficiently by using the Intel Cluster Tools. As a result, some parts of the application performance were improved considerably. In addition, EXASOL estimates that the development time and development efficiency has improved up to % using these tools. EXASOL, Business Intelligence Applications, Mathias Golombek, Principal Manager R&D 6

Get advanced performance and optimization with Intel Cluster Toolkit Compiler Edition. http://intel.com/software/products INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WAR- RANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling -8-548-475, or by visiting Intel s Web site at www.intel.com. 8, Intel Corporation. All rights reserved. Intel, the Intel logo, and Intel Core are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Printed in USA XXXX/XXX/XXX/XX/XX Please Recycle XXXXXX-US