Revision 1.1. Copyright 2011, XLsoft K.K. All rights reserved. 1

Size: px
Start display at page:

Download "Revision 1.1. Copyright 2011, XLsoft K.K. All rights reserved. 1"

Transcription

1 1. Revision 1.1 Copyright 2011, XLsoft K.K. All rights reserved. 1

2 Cluster Studio XE 2012 Compiler C/C++ Fortran Library : MKL MPI: MPI C++ : TBB : IPP Analyzer Copyright 2011, XLsoft K.K. All rights reserved. 2

3 icc icpc ifort mpiicc mpiicpc mpiifort <Install dir> C C++ Fortran mpicc C++ mpic++, mpicxx C++ mpif77,mpif90 Fortran : <VTune Install dir> = /opt/intel/vtune_amplifier_xe/ Copyright 2011, XLsoft K.K. All rights reserved. 3

4 Cluster Studio XE 2012 Copyright 2011, XLsoft K.K. All rights reserved. 4

5 C++ Fortran (PGO) (GAP) ( ) Sandy Bridge-EP OpenMP3.1 x86 x86_64 GCC(Windows VisualStudio) C/C++ Fortran Copyright 2011, XLsoft K.K. All rights reserved. 5

6 Fortran 12.1 Co-array Fortran Fortran 2008(ISO/IEC :2004) Fortran IV/77/90/95/2003 (Fortran 2008 ) ISO ISO/IEC 1539:1991 ISO/IEC :1997 ISO/IEC1539-1:2004 ANSI X ANSI X ANSIX ANSI X3J3/ ISO/IEC :2004 Copyright 2011, XLsoft K.K. All rights reserved. 6

7 C C++11(ISO/IEC 14882:2011) IEEE ( CilkPlus) C/C++ C99 (C++11 ) ISO/IEC 9899:1990 ISO/IEC 9899:1999 ISO/IEC 14882:1998) ISO/IEC 14882: bit long double Copyright 2011, XLsoft K.K. All rights reserved. 7

8 x86_64 source <composer Install dir>/bin/compilervars.sh intel64 source <icc Install dir>/bin/iccvars.sh intel64 source <ifort Install dir>/bin/ifortvars.sh intel64 x86_64 MPI source <Intel MPI Install dir>/bin64/mpivars.sh VTune Amplifier XE source <VTune Install dir>/amplxe-vars.sh Trace Analyzer/Collector source <TA/TC Install dir>/bin/itacvars.sh composer icc ifort C.sh.csh Copyright 2011, XLsoft K.K. All rights reserved. 8

9 C: icc [ ] < > C++: icpc [ ] < > Fortran: ifort [ ] < > MPI( ) C: mpiicc [ ] < > C++: mpiicpc [ ] < > Fortran: mpiifort [ ] < > < > [ ] < > [ ] : icc -O3 -xavx -ipo main.c func1.c func2.c obj_c.o icpc main.cpp func1.cpp func2.cpp obj_cpp.o ifort -O3 -xsse4.2 prog.f90 sub.f -o my_prog_name.out obj_c.o.c obj_cpp.cpp Copyright 2011, XLsoft K.K. All rights reserved. 9

10 (Linux ) ( ) OpenMP: (SandyBridge ) Trace Collector mpiicc -parallel < > mpiicc -openmp < > mpiicc -O3 < > mpiicc -xavx < > mpiicc -ipo < > mpiicc -vec_report3 < > mpiicc -trace -g < > Trace Analyzer/ Collector : mpiicc -O3 -xavx -ipo [-openmp] [-parallel] < > mpiicc icc ifort, mpiicpc, mpiifort ( -trace mpiicc, mpiicpc, mpifort ) Copyright 2011, XLsoft K.K. All rights reserved. 10

11 (Linux ) -O0 -O1, -O2, -O3 -x, -ax, -m -ipo -openmp -parallel -prof_gen, -prof_use -guide -vec_report<number> -par_report<number> -fast -fp-model <model> (3 ) OpenMP (Profile Guided Optimization PGO ) ( 1,2,3,4,5 ) ( 1,2,3 ) -ipo -O3 -no-prec-div -static -xhost (model strict, precise, fast ) Copyright 2011, XLsoft K.K. All rights reserved. 11

12 (Linux ) -prec-div -static-intel, -shared-intel -mkl[=lib] : -mkl=cluster ( static) MKL (=lib ) parallel MKL lib sequential MKL cluster MKL MKL Copyright 2011, XLsoft K.K. All rights reserved. 12

13 OpenMP* - export KMP_AFFINITY= physical ( ) compact ( ) scatter ( ) Copyright 2011, XLsoft K.K. All rights reserved. 13

14 Composer XE (IPO) (PGO) 4. (GAP) 5. VTune AmplifierXE Copyright 2011, XLsoft K.K. All rights reserved. 14

15 : -O3 High-Level Optimization: HLO -x, -ax -O0 -O1 -O2 -O3 Copyright 2011, XLsoft K.K. All rights reserved. 15

16 : -ipo Inter Procedural Optimization: IPO IPO IPO file1.c file2.c file3.c file4.c file1.c file3.c file4.c file2.c Copyright 2011, XLsoft K.K. All rights reserved. 16

17 : -parallel Parallelizer -par_report[n] n 0,1,2,3 int_sin.cpp(74): (col. 4) remark: :. int_sin.cpp(92): (col. 6) remark: :. -par-threshold[n] n int_sin.c(92): (col. 6) remark:. Copyright 2011, XLsoft K.K. All rights reserved. 17

18 : -m,-x,-ax Vectorizer SIMD Single Instruction Multiple Data for ( i=0; i<max; i++ ) c[i] = a[i] + b[i]; -no-vec -xavx Intel AVX 1 1 MAX A + B SIMD 1 8 MAX/8 A8 A7 A6 A5 A4 A3 A2 A B8 B7 B6 B5 B4 B3 B2 B1 C C8 C7 C6 C5 C4 C3 C2 C1 AVX float Copyright 2011, XLsoft K.K. All rights reserved. 18

19 -m,-x,-ax -x<code> <code>=host,avx,sse4. 2,SSE4.1,SSE3_ATOM,SS SE3,SSE3,SSE2 -ax<code> <code>=avx,sse4.2,sse 4.1,SSSE3,SSE3,SSE2 -m<code> <code>=ia32, sse2, sse3, ssse3, sse4.1 AVX SSE4.2 :-xavx AVX SSE4.2 SSE4.1 SSSE3 SSE3 SSE2 SSE AFX SSE4.2 :-axavx AVX SSE4.2 SSE4.1 SSSE3 SSE3 SSE2 SSE ia32 sse2 :-msse4.1 SSE4.1 SSSE3 SSE3 SSE2 SSE Copyright 2011, XLsoft K.K. All rights reserved. 19

20 -ax : -axavx,sse4.2,sse3 AVX : AVX SSE4.2 : SSE4.2 SSE4.1 : SSE3 SSE2 : Copyright 2011, XLsoft K.K. All rights reserved. 20

21 : -guide Guided Auto Parallelization: GAP -guide-par -guide-vec -guide-data-trans gap_vec.c(8): remark #30536: (LOOP) -fargument-noalias ( ) 8 [ ] [ ] "matrix_mul_matrix" "restrict" 8 [ ] "restrict" Copyright 2011, XLsoft K.K. All rights reserved. 21

22 : -fp-model precise strict -fp-model fast t0 = 4.1f + t1 + t2 float t0, t1, t2 t0 = 4.0f + 0.1f + t1 + t2 -fp-model precise -fp-model strict t0 = (4.1f + t1) + t2 t0 = (((4.0f + 0.1f) + t1) + t2) Copyright 2011, XLsoft K.K. All rights reserved. 22

23 Copyright 2011, XLsoft K.K. All rights reserved. 23

No Time to Read This Book?

No Time to Read This Book? Chapter 1 No Time to Read This Book? We know what it feels like to be under pressure. Try out a few quick and proven optimization stunts described below. They may provide a good enough performance gain

More information

Intel MPI Cluster Edition on Graham A First Look! Doug Roberts

Intel MPI Cluster Edition on Graham A First Look! Doug Roberts Intel MPI Cluster Edition on Graham A First Look! Doug Roberts SHARCNET / COMPUTE CANADA Intel Parallel Studio XE 2016 Update 4 Cluster Edition for Linux 1. Intel(R) MPI Library 5.1 Update 3 Cluster Ed

More information

Programming LRZ. Dr. Volker Weinberg, RRZE, 2018

Programming LRZ. Dr. Volker Weinberg, RRZE, 2018 Programming Environment @ LRZ Dr. Volker Weinberg, weinberg@lrz.de RRZE, 2018 Development tools Activity Tools Linux versions Source code development Editors vi, emacs, etc. Executable creation Compilers

More information

Intel Software Development Products Licensing & Programs Channel EMEA

Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Advanced Performance Distributed Performance Intel Software Development Products Foundation of

More information

Exploiting the Power of the Intel Compiler Suite. Dr. Mario Deilmann Intel Compiler and Languages Lab Software Solutions Group

Exploiting the Power of the Intel Compiler Suite. Dr. Mario Deilmann Intel Compiler and Languages Lab Software Solutions Group Exploiting the Power of the Intel Compiler Suite Dr. Mario Deilmann Intel Compiler and Languages Lab Software Solutions Group Agenda Compiler Overview Intel C++ Compiler High level optimization IPO, PGO

More information

What s New August 2015

What s New August 2015 What s New August 2015 Significant New Features New Directory Structure OpenMP* 4.1 Extensions C11 Standard Support More C++14 Standard Support Fortran 2008 Submodules and IMPURE ELEMENTAL Further C Interoperability

More information

Using Intel VTune Amplifier XE for High Performance Computing

Using Intel VTune Amplifier XE for High Performance Computing Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message

More information

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting

More information

Performance Tuning on Itasca

Performance Tuning on Itasca Performance Tuning on Itasca Shuxia Zhangh and Andrew Gustafson Nov. 27, 2012 Outline What computer resources are available under Itasca umbrella? Has your code run efficient? Profiling applications using

More information

Intel Architecture and Tools Jureca Tuning for the platform II. Dr. Heinrich Bockhorst Intel SSG/DPD/ Date:

Intel Architecture and Tools Jureca Tuning for the platform II. Dr. Heinrich Bockhorst Intel SSG/DPD/ Date: Intel Architecture and Tools Jureca Tuning for the platform II Dr. Heinrich Bockhorst Intel SSG/DPD/ Date: 23.11.2017 Agenda Introduction Processor Architecture Overview Composer XE Compiler Intel Python

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel

More information

Intel Fortran Composer XE 2011 Getting Started Tutorials

Intel Fortran Composer XE 2011 Getting Started Tutorials Intel Fortran Composer XE 2011 Getting Started Tutorials Document Number: 323651-001US World Wide Web: http://developer.intel.com Legal Information Contents Legal Information...5 Introducing the Intel

More information

Compiler Options. Linux/x86 Performance Practical,

Compiler Options. Linux/x86 Performance Practical, Center for Information Services and High Performance Computing (ZIH) Compiler Options Linux/x86 Performance Practical, 17.06.2009 Zellescher Weg 12 Willers-Bau A106 Tel. +49 351-463 - 31945 Ulf Markwardt

More information

Intel C++ & Fortran Compiler. Presenter: Georg Zitzlsberger

Intel C++ & Fortran Compiler. Presenter: Georg Zitzlsberger Intel C++ & Fortran Compiler Presenter: Georg Zitzlsberger Date: 09-07-2015 Agenda Introduction How to Use Compiler Highlights Numerical Stability What s New (16.0)? Summary 2 Why Use Intel C++/Fortran

More information

PRACE PATC Course: Vectorisation & Basic Performance Overview. Ostrava,

PRACE PATC Course: Vectorisation & Basic Performance Overview. Ostrava, PRACE PATC Course: Vectorisation & Basic Performance Overview Ostrava, 7-8.2.2017 1 Agenda Basic Vectorisation & SIMD Instructions IMCI Vector Extension Intel compiler flags Hands-on Intel Tool VTune Amplifier

More information

Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism.

Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism. Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism. Parallel + SIMD is the Path Forward Intel Xeon and Intel Xeon Phi Product

More information

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Document number: 323804-002US 21 June 2012 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.2 Product Contents...

More information

Intel Software Development Products for High Performance Computing and Parallel Programming

Intel Software Development Products for High Performance Computing and Parallel Programming Intel Software Development Products for High Performance Computing and Parallel Programming Multicore development tools with extensions to many-core Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

Getting Started with Intel SDK for OpenCL Applications

Getting Started with Intel SDK for OpenCL Applications Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel

More information

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Document number: 323804-001US 8 October 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 What s New...

More information

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...

More information

Vectorization Advisor: getting started

Vectorization Advisor: getting started Vectorization Advisor: getting started Before you analyze Run GUI or Command Line Set-up environment Linux: source /advixe-vars.sh Windows: \advixe-vars.bat Run GUI or Command

More information

Intel Compilers for C/C++ and Fortran

Intel Compilers for C/C++ and Fortran Intel Compilers for C/C++ and Fortran Georg Zitzlsberger georg.zitzlsberger@vsb.cz 1st of March 2018 Agenda Important Optimization Options for HPC High Level Optimizations (HLO) Pragmas Interprocedural

More information

Advanced Parallel Programming II

Advanced Parallel Programming II Advanced Parallel Programming II Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Introduction to Vectorization RISC Software GmbH Johannes Kepler

More information

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor

More information

Intel Composer XE. Copyright 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel Composer XE. Copyright 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Composer XE Intel Parallel Studio XE 2011 for Advanced Performance Boost Performance. Scale Forward. Ensure Confidence. Intel Parallel Studio XE 2011 Windows and Linux Value Proposition What Leading

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel

More information

Revealing the performance aspects in your code

Revealing the performance aspects in your code Revealing the performance aspects in your code 1 Three corner stones of HPC The parallelism can be exploited at three levels: message passing, fork/join, SIMD Hyperthreading is not quite threading A popular

More information

GAP Guided Auto Parallelism A Tool Providing Vectorization Guidance

GAP Guided Auto Parallelism A Tool Providing Vectorization Guidance GAP Guided Auto Parallelism A Tool Providing Vectorization Guidance 7/27/12 1 GAP Guided Automatic Parallelism Key design ideas: Use compiler to help detect what is blocking optimizations in particular

More information

Intel Math Kernel Library 10.3

Intel Math Kernel Library 10.3 Intel Math Kernel Library 10.3 Product Brief Intel Math Kernel Library 10.3 The Flagship High Performance Computing Math Library for Windows*, Linux*, and Mac OS* X Intel Math Kernel Library (Intel MKL)

More information

Reusing this material

Reusing this material XEON PHI BASICS Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

A Simple Path to Parallelism with Intel Cilk Plus

A Simple Path to Parallelism with Intel Cilk Plus Introduction This introductory tutorial describes how to use Intel Cilk Plus to simplify making taking advantage of vectorization and threading parallelism in your code. It provides a brief description

More information

Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector

Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector A brief Introduction to MPI 2 What is MPI? Message Passing Interface Explicit parallel model All parallelism is explicit:

More information

Native Computing and Optimization. Hang Liu December 4 th, 2013

Native Computing and Optimization. Hang Liu December 4 th, 2013 Native Computing and Optimization Hang Liu December 4 th, 2013 Overview Why run native? What is a native application? Building a native application Running a native application Setting affinity and pinning

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

Intel C++ Compiler Professional Edition 11.0 for Windows* In-Depth

Intel C++ Compiler Professional Edition 11.0 for Windows* In-Depth Intel C++ Compiler Professional Edition 11.0 for Windows* In-Depth Contents Intel C++ Compiler Professional Edition for Windows*..... 3 Intel C++ Compiler Professional Edition At A Glance...3 Intel C++

More information

Parallel Programming. The Ultimate Road to Performance April 16, Werner Krotz-Vogel

Parallel Programming. The Ultimate Road to Performance April 16, Werner Krotz-Vogel Parallel Programming The Ultimate Road to Performance April 16, 2013 Werner Krotz-Vogel 1 Getting started with parallel algorithms Concurrency is a general concept multiple activities that can occur and

More information

Presenter: Dr. Heinrich Bockhorst Date:

Presenter: Dr. Heinrich Bockhorst Date: Intel Architecture and Tools JURECA - Tuning for the platform II Presenter: Dr. Heinrich Bockhorst Date: 26-11-2015 1 Agenda Introduction Processor Architecture Basics Composer XE Selected Intel Tools

More information

Getting Started with Intel Cilk Plus SIMD Vectorization and SIMD-enabled functions

Getting Started with Intel Cilk Plus SIMD Vectorization and SIMD-enabled functions Getting Started with Intel Cilk Plus SIMD Vectorization and SIMD-enabled functions Introduction SIMD Vectorization and SIMD-enabled Functions are a part of Intel Cilk Plus feature supported by the Intel

More information

Introduction to Performance Tuning & Optimization Tools

Introduction to Performance Tuning & Optimization Tools Introduction to Performance Tuning & Optimization Tools a[i] a[i+1] + a[i+2] a[i+3] b[i] b[i+1] b[i+2] b[i+3] = a[i]+b[i] a[i+1]+b[i+1] a[i+2]+b[i+2] a[i+3]+b[i+3] Ian A. Cosden, Ph.D. Manager, HPC Software

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title Programming for the Intel Many Integrated Core Architecture By James Reinders The Architecture for Discovery PowerPoint Title Intel Xeon Phi coprocessor 1. Designed for Highly Parallel workloads 2. and

More information

Intel Parallel Studio XE 2015

Intel Parallel Studio XE 2015 2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:

More information

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems. Introduction A resource leak refers to a type of resource consumption in which the program cannot release resources it has acquired. Typically the result of a bug, common resource issues, such as memory

More information

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward.

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward. High Performance Parallel Programming Multicore development tools with extensions to many-core. Investment protection. Scale Forward. Enabling & Advancing Parallelism High Performance Parallel Programming

More information

Ge#ng Started with Automa3c Compiler Vectoriza3on. David Apostal UND CSci 532 Guest Lecture Sept 14, 2017

Ge#ng Started with Automa3c Compiler Vectoriza3on. David Apostal UND CSci 532 Guest Lecture Sept 14, 2017 Ge#ng Started with Automa3c Compiler Vectoriza3on David Apostal UND CSci 532 Guest Lecture Sept 14, 2017 Parallellism is Key to Performance Types of parallelism Task-based (MPI) Threads (OpenMP, pthreads)

More information

Getting Reproducible Results with Intel MKL

Getting Reproducible Results with Intel MKL Getting Reproducible Results with Intel MKL Why do results vary? Root cause for variations in results Floating-point numbers order of computation matters! Single precision example where (a+b)+c a+(b+c)

More information

VLPL-S Optimization on Knights Landing

VLPL-S Optimization on Knights Landing VLPL-S Optimization on Knights Landing 英特尔软件与服务事业部 周姗 2016.5 Agenda VLPL-S 性能分析 VLPL-S 性能优化 总结 2 VLPL-S Workload Descriptions VLPL-S is the in-house code from SJTU, paralleled with MPI and written in C++.

More information

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition for Linux*...3 Intel C++ Compiler Professional Edition Components:...3 Features...3 New

More information

Presenter: Georg Zitzlsberger. Date:

Presenter: Georg Zitzlsberger. Date: Presenter: Georg Zitzlsberger Date: 07-09-2016 1 Agenda Introduction to SIMD for Intel Architecture Compiler & Vectorization Validating Vectorization Success Intel Cilk Plus OpenMP* 4.x Summary 2 Vectorization

More information

Overview of Intel Parallel Studio XE

Overview of Intel Parallel Studio XE Overview of Intel Parallel Studio XE Stephen Blair-Chappell 1 30-second pitch Intel Parallel Studio XE 2011 Advanced Application Performance What Is It? Suite of tools to develop high performing, robust

More information

Intel MPI Library Conditional Reproducibility

Intel MPI Library Conditional Reproducibility 1 Intel MPI Library Conditional Reproducibility By Michael Steyer, Technical Consulting Engineer, Software and Services Group, Developer Products Division, Intel Corporation Introduction High performance

More information

Eliminate Threading Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability Introduction This guide will illustrate how the thread checking capabilities in Intel Parallel Studio XE can be used to find crucial threading defects early in the development cycle. It provides detailed

More information

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero Introduction to Intel Xeon Phi programming techniques Fabio Affinito Vittorio Ruggiero Outline High level overview of the Intel Xeon Phi hardware and software stack Intel Xeon Phi programming paradigms:

More information

Bei Wang, Dmitry Prohorov and Carlos Rosales

Bei Wang, Dmitry Prohorov and Carlos Rosales Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512

More information

Code Optimization. Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC)

Code Optimization. Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC) Code Optimization Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC) brandon.barker@cornell.edu Workshop: High Performance Computing on Stampede January 15, 2015

More information

Overview of Intel Xeon Phi Coprocessor

Overview of Intel Xeon Phi Coprocessor Overview of Intel Xeon Phi Coprocessor Sept 20, 2013 Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu This talk is only a trailer A comprehensive training on running and optimizing

More information

Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth

Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth Contents Intel Visual Fortran Compiler Professional Edition for Windows*........................ 3 Features...3 New in This

More information

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Python Landscape Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#1:

More information

Introduc)on to Hyades

Introduc)on to Hyades Introduc)on to Hyades Shawfeng Dong Department of Astronomy & Astrophysics, UCSSC Hyades 1 Hardware Architecture 2 Accessing Hyades 3 Compu)ng Environment 4 Compiling Codes 5 Running Jobs 6 Visualiza)on

More information

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation

More information

Using Intel AVX without Writing AVX

Using Intel AVX without Writing AVX 1 White Paper Using Intel AVX without Writing AVX Introduction and Tools Intel Advanced Vector Extensions (Intel AVX) is a new 256-bit instruction set extension to Intel Streaming SIMD Extensions (Intel

More information

Using the Intel Math Kernel Library (Intel MKL) and Intel Compilers to Obtain Run-to-Run Numerical Reproducible Results

Using the Intel Math Kernel Library (Intel MKL) and Intel Compilers to Obtain Run-to-Run Numerical Reproducible Results Using the Intel Math Kernel Library (Intel MKL) and Intel Compilers to Obtain Run-to-Run Numerical Reproducible Results by Todd Rosenquist, Technical Consulting Engineer, Intel Math Kernal Library and

More information

Intel Xeon Phi programming. September 22nd-23rd 2015 University of Copenhagen, Denmark

Intel Xeon Phi programming. September 22nd-23rd 2015 University of Copenhagen, Denmark Intel Xeon Phi programming September 22nd-23rd 2015 University of Copenhagen, Denmark Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED,

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 8 Processor-level SIMD SIMD instructions can perform

More information

Intel Integrated Performance Primitives 7.0 Windows

Intel Integrated Performance Primitives 7.0 Windows Intel Integrated Performance Primitives 7.0 Windows 2012 2 2 SSE AVX OpenMP OpenMP Copyright 1998-2012 XLsoft Corporation. All Rights Reserved. 2 Prefix ippac ippac.h ippac[_*].lib ippac[**]-x.x.dll ipps

More information

Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth

Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Linux*.... 3 Intel C++ Compiler Professional Edition Components:......... 3 s...3

More information

Kevin O Leary, Intel Technical Consulting Engineer

Kevin O Leary, Intel Technical Consulting Engineer Kevin O Leary, Intel Technical Consulting Engineer Moore s Law Is Going Strong Hardware performance continues to grow exponentially We think we can continue Moore's Law for at least another 10 years."

More information

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation S c i c o m P 2 0 1 3 T u t o r i a l Intel Xeon Phi Product Family Programming Tools Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation Agenda Intel Parallel Studio XE 2013

More information

HPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel

HPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel 15.03.2012 1 Accelerating HPC HPC Advisory Council Lugano, CH March 15 th, 2012 Herbert Cornelius Intel Legal Disclaimer 15.03.2012 2 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Optimization and Scalability. Steve Lantz Senior Research Associate Cornell CAC

Optimization and Scalability. Steve Lantz Senior Research Associate Cornell CAC Optimization and Scalability Steve Lantz Senior Research Associate Cornell CAC Workshop: Parallel Computing on Stampede June 18, 2013 Putting Performance into Design and Development MODEL ALGORITHM IMPLEMEN-

More information

Optimization & Scalability

Optimization & Scalability Optimization & Scalability Carlos Rosales carlos@tacc.utexas.edu January 11 th, 2013 Parallel Computing in Stampede What this talk is about Highlight main performance and scalability bottlenecks Simple

More information

Intel Cluster Studio XE 2012 for Linux* OS

Intel Cluster Studio XE 2012 for Linux* OS Intel Cluster Studio XE 2012 for Linux* OS Tutorial Copyright 2011 Intel Corporation All Rights Reserved Document Number: 325977-001EN Revision: 20111108 World Wide Web: http://www.intel.com Contents Disclaimer

More information

Bring your application to a new era:

Bring your application to a new era: Bring your application to a new era: learning by example how to parallelize and optimize for Intel Xeon processor and Intel Xeon Phi TM coprocessor Manel Fernández, Roger Philp, Richard Paul Bayncore Ltd.

More information

Quick-Reference Guide to Optimization with Intel Compilers

Quick-Reference Guide to Optimization with Intel Compilers Quick-Reference Guide to Optimization with Intel Compilers For IA-32 processors, processors supporting Intel Extended Memory 64 Technology (Intel 64) and Intel Itanium (IA-64) processors. 1. 2. 3. 4. 5.

More information

WRF performance on Intel Processors

WRF performance on Intel Processors WRF performance on Intel Processors R. Dubtsov, A. Semenov, D. Shkurko Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {roman.s.dubtsov, alexander.l.semenov,dmitry.v.shkurko,}@intel.com

More information

Vectorization on KNL

Vectorization on KNL Vectorization on KNL Steve Lantz Senior Research Associate Cornell University Center for Advanced Computing (CAC) steve.lantz@cornell.edu High Performance Computing on Stampede 2, with KNL, Jan. 23, 2017

More information

Intel Parallel Studio XE 2018

Intel Parallel Studio XE 2018 Intel Parallel Studio XE 2018 Installation Guide for Linux* OS 11 September 2017 Contents 1 Introduction...2 1.1 Licensing Information...2 2 Prerequisites...2 2.1 Notes for Cluster Installation...3 2.1.1

More information

Cluster Clonetroop: HowTo 2014

Cluster Clonetroop: HowTo 2014 2014/02/25 16:53 1/13 Cluster Clonetroop: HowTo 2014 Cluster Clonetroop: HowTo 2014 This section contains information about how to access, compile and execute jobs on Clonetroop, Laboratori de Càlcul Numeric's

More information

Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer?

Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application

More information

Three Questions every one keeps asking. Stephen Blair-Chappell Intel Compiler Labs

Three Questions every one keeps asking. Stephen Blair-Chappell Intel Compiler Labs Three Questions every one keeps asking Stephen Blair-Chappell Intel Compiler Labs Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any

More information

Intel Parallel Studio XE 2016

Intel Parallel Studio XE 2016 Intel Parallel Studio XE 2016 Installation Guide for Linux* OS 18 August 2015 Contents 1 Introduction...2 2 Prerequisites...2 3 Installation...6 3.1 Using Online Installer...6 3.2 Installation Through

More information

Intel Parallel Studio XE 2019 Update 1

Intel Parallel Studio XE 2019 Update 1 Intel Parallel Studio XE 2019 Update 1 Installation Guide for Linux* OS 7 November 2018 Contents 1 Introduction...2 1.1 Licensing Information...2 2 Prerequisites...2 2.1 Notes for Cluster Installation...3

More information

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3

More information

Compilers & Optimized Librairies

Compilers & Optimized Librairies Institut de calcul intensif et de stockage de masse Compilers & Optimized Librairies Modules Environment.bashrc env $PATH... Compilers : GNU, Intel, Portland Memory considerations : size, top, ulimit Hello

More information

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 Intel Debugger (IDB) is

More information

PRACE Summer School, CINECA 8-11 July 2013 Intel Xeon Phi Programming Environment. Hans Pabst, July 2013 Software and Services Group Intel Corporation

PRACE Summer School, CINECA 8-11 July 2013 Intel Xeon Phi Programming Environment. Hans Pabst, July 2013 Software and Services Group Intel Corporation PRACE Summer School, CINECA 8-11 July 2013 Intel Xeon Phi Programming Environment Hans Pabst, July 2013 Software and Services Group Intel Corporation Agenda Intel Manycore Platform Software Stack Getting

More information

Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor Intel Xeon Phi Coprocessor http://tinyurl.com/inteljames twitter @jamesreinders James Reinders it s all about parallel programming Source Multicore CPU Compilers Libraries, Parallel Models Multicore CPU

More information

Native Computing and Optimization on Intel Xeon Phi

Native Computing and Optimization on Intel Xeon Phi Native Computing and Optimization on Intel Xeon Phi ISC 2015 Carlos Rosales carlos@tacc.utexas.edu Overview Why run native? What is a native application? Building a native application Running a native

More information

Scaling Out Python* To HPC and Big Data

Scaling Out Python* To HPC and Big Data Scaling Out Python* To HPC and Big Data Sergey Maidanov Software Engineering Manager for Intel Distribution for Python* What Problems We Solve: Scalable Performance Make Python usable beyond prototyping

More information

Graphics Performance Analyzer for Android

Graphics Performance Analyzer for Android Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent

More information

Optimizing Code for Intel Multi-Core Processors Intel Core Microarchitecture On Linux

Optimizing Code for Intel Multi-Core Processors Intel Core Microarchitecture On Linux Optimizing Code for Intel Multi-Core Processors Intel Core Microarchitecture On Linux 2 June 2007 Intel Corporation Legal Lines and Disclaimers - Inner Front Cover 4 June 2007 Intel Corporation Optimizing

More information

Simplified and Effective Serial and Parallel Performance Optimization

Simplified and Effective Serial and Parallel Performance Optimization HPC Code Modernization Workshop at LRZ Simplified and Effective Serial and Parallel Performance Optimization Performance tuning Using Intel VTune Performance Profiler Performance Tuning Methodology Goal:

More information

Cilk Plus GETTING STARTED

Cilk Plus GETTING STARTED Cilk Plus GETTING STARTED Overview Fundamentals of Cilk Plus Hyperobjects Compiler Support Case Study 3/17/2015 CHRIS SZALWINSKI 2 Fundamentals of Cilk Plus Terminology Execution Model Language Extensions

More information

Installation of OpenMX

Installation of OpenMX Installation of OpenMX Truong Vinh Truong Duy and Taisuke Ozaki OpenMX Group, ISSP, The University of Tokyo 2015/03/30 Download 1. Download the latest version of OpenMX % wget http://www.openmx-square.org/openmx3.7.tar.gz

More information

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend

More information

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012 Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2

More information

How to Use the Condo and CyEnce Clusters Glenn R. Luecke Director of HPC Education & Professor of Mathematics April 11, 2018

How to Use the Condo and CyEnce Clusters Glenn R. Luecke Director of HPC Education & Professor of Mathematics April 11, 2018 How to Use the Condo and CyEnce Clusters Glenn R. Luecke Director of HPC Education & Professor of Mathematics April 11, 2018 Online Information and Help If you experience problems and would like help,

More information

Using Intel Inspector XE 2011 with Fortran Applications

Using Intel Inspector XE 2011 with Fortran Applications Using Intel Inspector XE 2011 with Fortran Applications Jackson Marusarz Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

How to Write Fast Numerical Code

How to Write Fast Numerical Code How to Write Fast Numerical Code Lecture: Benchmarking, Compiler Limitations Instructor: Markus Püschel TA: Gagandeep Singh, Daniele Spampinato, Alen Stojanov Last Time: ILP Latency/throughput (Pentium

More information