Three Questions every one keeps asking. Stephen Blair-Chappell Intel Compiler Labs

Size: px
Start display at page:

Download "Three Questions every one keeps asking. Stephen Blair-Chappell Intel Compiler Labs"

Transcription

1 Three Questions every one keeps asking Stephen Blair-Chappell Intel Compiler Labs

2 Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any CPU? - compatibility 2 8/2/2012

3 Intel Parallel Studio XE Amplifier XE Profiler Composer XE Compiler Libraries Intel Composer XE Use to generate fast, safe, parallel code (C/C++, Fortran) Intel VTune Amplifier XE Find hotspots and bottlenecks in you code. Inspector XE Memory Errors Parallel Errors Intel Inspector XE Use to find memory and threading errors Three Components 3 8/2/2012

4 Intel Parallel Studio XE Amplifier XE Profiler Composer XE Compiler Libraries + Advisor Intel Parallel Advisor Use to model parallelism in your existing applications Inspector XE Memory Errors Parallel Errors Intel Composer XE Use to generate fast, safe, parallel code (C/C++, Fortran) Intel VTune Amplifier XE Find hotspots and bottlenecks in you code. Intel Inspector XE Use to find memory and threading errors Four Three Components 4 8/2/2012

5 Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any CPU? - compatibility 5 8/2/2012

6 The compiler uses many optimisation techniques Faster Code fast floating point /2/2012

7 Faster Code Often we are happy with out-ofthe-box experience When was the last time you looked at some documentation? 7

8 The Seven Optimisation Steps Step 1 start Build with optimization disabled Step 2 Use General Optimizations Step 3 Use Processor-Specific Options Step 4 Add Inter-procedural Step 5 Use Profile Guided Optimization Step 6 Tune automatic vectorization Step 7 Implement Parallelism or use Automatic Parallelism Example options Faster Code Windows (Linux) /Od (-O0) /01,/02,/03 (-O1, -O2, -O3) /QxSSE4.2 /QxHOST /Qipo /Qprof-gen /Qprof-use /Qguide (-xsse4.2) (-xhost) (-ipo) (-prof-gen) (-prof-use) (-guide) Use Intel Family of Parallel Models /Qparallel (-parallel)

9 Vectorisation is Faster Code \ SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AES-NI AVX AVX2 MIC 70 instr Single- Precision Vectors Streaming operations 144 instr Doubleprecision Vectors 8/16/32 64/128-bit vector integer 13 instr Complex Data 32 instr Decode 47 instr Video Graphics building blocks Advanced vector instr 8 instr String/XML processing POP-Count CRC 7 instr Encryption and Decryption Key Generation ~100 new instr. ~300 legacy sse instr updated 256-bit vector 3 and 4- operand instructions Int. AVX expands to 256 bit Improved bit manip. fma Vector shifts Gather 512-bit vector for (i=0;i<max;i++) c[i]=a[i]+b[i]; a[3] a[2] + + b[3] b[2] a[1] a[0] + + b[1] b[0] c[3] c[2] c[1] c[0] 9 8/2/2012

10 Different Ways of Inserting Vectorised Code Use Performance Libraries (e.g. IPP and MKL) Compiler: Fully automatic vectorization Cilk Plus Array Notation Compiler: Auto vectorization hints (#pragma ivdep, ) User Mandated Vectorization ( SIMD Directive) Manual CPU Dispatch ( declspec(cpu_dispatch )) SIMD intrinsic class (F32vec4 add) Vector intrinsic (mm_add_ps()) Assembler code (addps) Ease of use Programmer control Faster Code 10 8/2/2012

11 An example Faster Code Speedup by upgrading silicon Speedup by swapping compiler Verified using VTune 11 8/2/2012

12 Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any CPU? - compatibility 12 8/2/2012

13 Speedup using parallelism Parallel Code Analyze 1 Analyze Implement Amplifier XE Hotspot EBS (XE only) 2 Composer XE Four Step Development Implement Compiler Cilk Plus OpenMP Libraries MKL TBB IPP 3 Debug Inspector XE Threads Memory 4 Amplifier XE Tune concurrency Debug Tune Locks & waits 13 8/2/2012

14 Four Different Ways to Find the Hotspots 1. Using Intel compiler s loop profiler & profile viewer 2. Using the compiler s Auto-parallelizer Parallel Code Analyze Implement Debug Tune 3. Using Amplifier XE 4. Performing a Survey with Advisor 14 8/2/2012

15 Language to help parallelism Parallel Code Intel Cilk Plus OpenMP #pragma omp parallel for for(i=1;i<=4;i++) { printf( Iter: %d, i); } Intel Threading Building Blocks Intel MPI Fortran Coarrays OpenCL cilk_for (int i = 0; i < max_row; i++) { for (int j = 0; j < max_col; j++ ) { p[i][j] = mandel( complex(scale(i), scale(j))); } } Native Threads 15 8/2/2012

16 Four Different Ways to Find your Parallel Errors 1. Using Inspector XE Parallel Code Analyze Implement 2. Perform a Static Security Analysis 3. Debug with Parallel Debug Extensions 4. Use Advisor Debug Tune 16 8/2/2012

17 An example Parallel Code Hotspot Analysis 2. Implement 3. Find Threading Errors 4,5,6. Tune Parallelism 6 8/2/2012

18 Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any CPU? - compatibility 18 8/2/2012

19 Will my program run on any CPU? Compatible Code Compatibility run? Future Proofing OS-agnostic CPU-agnostic Language / Standards Tools Scalability build? Performance? 19 8/2/2012

20 Vectorised Parallel On the graphs, bigger is better 20 8/2/2012

21 Running Example: Monte Carlo #pragma omp parallel for for(int opt = 0; opt < OPT_N; opt++) { float VBySqrtT = VOLATILITY * sqrtf(t[opt]); float MuByT = (RISKFREE 0.5f * VOLATILITY * VOLATILITY) * T[opt]; float Sval = S[opt]; float Xval = X[opt]; float val = 0.0f, val2 = 0.0f; #pragma simd reduction(+:val) reduction(+:val2) for(int pos = 0; pos < RAND_N; pos++){ float callvalue = expectedcall(sval, Xval, MuByT, VBySqrtT, l_random[pos]); val += callvalue; val2 += callvalue * callvalue; } float exprt = expf( RISKFREE *T[opt]); h_callresult[opt] = exprt * val / (float)rand_n; float stddev = sqrtf(((float)rand_n*val2 val*val) / ((float)rand_n*(float)(rand_n 1.f))); h_callconfidence[opt] =(float)(exprt * 1.96f * stddev/sqrtf((float)rand_n)); } SFTL003 hands on lab

22 Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 Helping Developers Efficiently Produce Fast, Scalable and Reliable Applications

23 More Cores. Wider Vectors. Performance Delivered. Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 More Cores Multicore Many-core Scaling Performance Efficiently 50+ cores Serial Performance Wider Vectors 128 Bits 256 Bits 512 Bits Task & Data Parallel Performance Distributed Performance Industry-leading performance from advanced compilers Comprehensive libraries Parallel programming models Insightful analysis tools 23

24 Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 Phase Product Feature Benefit Intel Advisor XE Threading design assistant (Studio products only) Simplifies, demystifies, and speeds parallel application design Build Intel Composer XE C/C++ and Fortran compilers Intel Threading Building Blocks Intel Cilk Plus Intel Integrated Performance Primitives Intel Math Kernel Library Enabling solution to achieve the application performance and scalability benefits of multicore and forward scale to many-core Intel MPI Library High Performance Message Passing (MPI) Library Enabling High Performance Scalability, Interconnect Independence, Runtime Fabric Selection, and Application Tuning Capability Intel VTune Amplifier XE Performance Profiler for optimizing application performance and scalability Remove guesswork, saves time, makes it easier to find performance and scalability bottlenecks Verify & Tune Intel Inspector XE Memory & threading dynamic analysis for code quality Static Analysis for code quality Increased productivity, code quality, and lowers cost, finds memory, threading, and security defects before they happen Intel Trace Analyzer & Collector MPI Performance Profiler for understanding application correctness & behavior Analyze performance of MPI programs and visualize parallel application behavior and communications patterns to identify hotspots Efficiently Produce Fast, Scalable and Reliable Applications 24

25 Top New Features Performance Performance Profiling Reliability Reproducibility Standards Parallelism Assistance Improved compiler and library performance + Ivy Bridge microarchitecture + Haswell microarchitecture A dozen new analysis features Low overhead Java* profiling CPU Power Analysis Pointer checker Heap growth analysis Improved MPI fault tolerance Conditional numerical reproducibility Expanded C++ 11 Expanded Fortran 2008 MPI 2.2 Analysis extended to include Linux*, Fortran and C# (in addition to Windows* and C/C++) + Intel Xeon Phi coprocessor Intel Cluster Studio XE Efficiently produce fast, scalable and reliable applications running on Windows* and Linux* 25

26 The Build Environment Tool Target Macro MS Compiler st version STEP_0 ICC 13 1 Find Hotspot STEP_1 ICC 13 2 Add SSE Intrinsics STEP_2 Solver VTune 13 3 Find Hotspot STEP_3 ICC 13 4 Add OpenMP Code STEP_4 Inspector ICC Check Correctness Fix Correctness STEP_5 STEP_6 Generator Build example make 13-0 or nmake 13-0 VTune 13 7 Tune Parallelism STEP_7 Key ICC 13 8 Finish STEP_8 Serial Release Mode OpenMP Debug Mode OpenMP Release Mode 26 8/2/2012

27 How to Run 13-0.exe test.txt 27 8/2/2012

28 Your Challenge the hands-on Examine each of the eight stage and use a combination of the compiler, inspector, and amplifier to understand what s going on Answer these questions Is the application using the CPU at it s best? (Steps 0, 2 and 8) What s the biggest hotspot in the serial code? (steps 1 and 3) What errors were introduced into the parallelism? (Steps 4, 5 & 6) How well is the parallelism tuned? (Steps 7 & 8) Supplement: Why is the Linux version slower than the Windows Version? 28 8/2/2012

29 29 Thank You

30 30 Backup

31 Intel Parallel Studio XE Intel Cluster Studio XE (30 minutes) 31

32 Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 Helping Developers Efficiently Produce Fast, Scalable and Reliable Applications

33 More Cores. Wider Vectors. Performance Delivered. Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 More Cores Multicore Many-core Scaling Performance Efficiently 50+ cores Serial Performance Wider Vectors 128 Bits 256 Bits 512 Bits Task & Data Parallel Performance Distributed Performance Industry-leading performance from advanced compilers Comprehensive libraries Parallel programming models Insightful analysis tools 33

34 What s New? Intel Parallel Studio XE 2013/ Intel Cluster Studio XE 2013 Intel Parallel Studio XE Intel Cluster Compiler Studio s & XE Libraries Performance Leadership: 3rd Generation Intel Core Processors (code name Ivy Bridge ) and future Intel processors (code name Haswell ) Intel Xeon Phi coprocessors Improved C++ and Fortran performance New Product Capabilities Latest OS: Windows* 8 Desktop, Linux* IDE: Visual Studio 2008, 2010, 2012 and gnu tool chain Standards: C99, selected C++11 features, almost complete Fortran 2003 support and selected features from Fortran 2008, Fortran 2008, MPI

35 Boost Performance 35

36 Support for Latest Intel Processors and Coprocessors Intel Ivy Bridge microarchitecture Intel Haswell microarchitecture Intel Xeon Phi coprocessor Intel C++ and Fortran Compiler AVX AVX2, FMA3 IMCI Intel TBB library Intel MKL library AVX AVX2, FMA3 Intel MPI library Intel VTune Amplifier XE Hardware Events Hardware Events Hardware Events Intel Inspector XE Memory & Thread Checks Memory & Thread Memory & Thread Hardware events for new processors added as new processors ship. Analysis runs on multicore processors, provides analysis for multicore and many-core processors. 36

37 Performance-Oriented Compiler Suites Intel Compilers, Performance Libraries, Debugging Tools On Windows, Linux and Mac OS X Intel C++ Composer XE 2013 Intel C++ Compiler XE 13.0 with Intel Cilk Plus Intel TBB Intel MKL Intel IPP Intel Xeon Phi product family support, Linux Intel Fortran Composer XE 2013 Intel Fortran Compiler XE 13.0 Intel MKL Compatibility with Compaq Visual Fortran* Fortran 2003, 2008 support Intel Xeon Phi product family support, Linux Intel Composer XE 2013 Combines Intel C++ Composer XE and Intel Fortran Composer XE For Fortran developers who also want Intel C++ Windows (requires Visual Studio) and Linux only Windows: Intel C++/Visual* C++ compatibility & integration into Microsoft* Visual Studio* Linux: Intel C++/gcc* compatibility & integration into Eclipse* CDT Mac OS X: Intel C++/gcc compatibility & integration into XCode* Environment All: Intel Fortran performance leadership, compatible with Compaq* Visual* Fortran All: Leadership performance on Intel and compatible architectures All: One Year Intel Premier Support. Renewable Annually. Performance. Compatibility. Support. 37

38 Superior C++ Compiler Performance More Performance Just recompile Uses Intel AVX and Intel AVX2 instructions Intel Xeon Phi product family support, Linux: Compiler, debugger (Linux) Intel Cilk Plus: Tasking and vectorization 38

39 Superior Fortran Compiler Performance More Performance Just recompile Intel Xeon Phi product family: Linux compiler, debugger support Access to Intel AVX and Intel AVX2 instructions (-xa or /Qxa) Auto-parallelizer & directives to access SIMD instructions Coarrays & synchronization constructs support parallel programming Loop optimization directives: VECTOR, PARALLEL, SIMD More control over array data alignment (align arraynbytes) 39

40 C++ Performance Guide Performance Wizard for Windows Quick 5 step process for more performance Get help choosing optimization options Intel Parallel Studio XE Intel Cluster Compiler Studio s & XE Libraries Gain Performance with Less Effort 40

41 Intel Math Kernel Library (MKL) Highly optimized threaded math routines Applications in science, engineering, finance Use Intel MKL on Windows*, Linux*, Mac OS* Use Intel MKL with Intel compiler, gcc, MSFT*, PGI Component of Intel Parallel Studio XE and Intel Cluster Studio XE EDC North America Development Survey 2011, Volume II 33% of math libraries users rely on Intel s Math Kernel Library Drop In The Next Intel MKL Version to Unlock New Processor Performance 41

42 LAPACK Performance Improves with Intel Math Kernel Library Compilers & Libraries 42

43 Intel Integrated Performance Primitives (IPP) A Library Of Highly Optimized Algorithmic Building Blocks For Media And Data Applications Optimized for Performance and Power Efficiency Intel Engineered & Future Proofed to Save You Time Wide range of Cross Platform & OS Functionality Highly optimized using SSE, AVX instruction sets Performance beyond what an optimized compiler produces alone Ready-to-use & royalty free Fully optimized for current and past processors Save development, debug, and maintenance time Code once now, receive future optimizations later Thousands of optimized functions Supports Windows*, Linux*, and Mac OS* X Supports Intel Atom, Intel Core, Intel Xeon, platforms Availability: Part of several different product packages with single, multi-user licenses as well as volume, academic, and student discounts available. Try it Before You Buy It: Download a trial version today at intel.com/software/products/eval Performance Building Blocks to Make Your Applications Faster, Faster 43

44 Intel IPP Boost from Intel AVX 44

45 Intel VTune Amplifier XE Performance Profiler Where is my application Spending Time? Wasting Time? Waiting Too Long? Intel VTune Amplifier XE Focus tuning on functions taking time See call stacks See time on source See cache misses on your source See functions sorted by # of cache misses See locks by wait time Red/Green for CPU utilization during wait Windows & Linux Low overhead No special recompiles We improved the performance of the latest run 3 fold. We wouldn't have found the problem without something like Intel VTune Amplifier XE. Claire Cates Principal Developer, SAS Institute Inc. Advanced Profiling for Scalable Multicore Performance 45 45

46 A Dozen New Analysis Features Intel VTune Amplifier XE 2013 More Profiling Data 1) Statistical Call Counts Data for Inlining & Parallelization 2) Hardware Events + Stacks Lower overhead, Higher resolution Finds hot spots in small functions 3) Uncore Event Counting More accurate bandwidth analysis 4) Ivy Bridge Events 5) Haswell Events Updates as new processors ship 6) Intel Xeon Phi Products Hardware events Easier To Use 7) Source View for Inlined Code (For Intel and GCC* compilers) 8) Java Tuning Results map to the Java source 9) Task Annotation API Label and visualize tasks. 10) User Defined Metrics Create meaningful metrics from events 11) Programmable Hot Keys Start and stop collection easily 12) More/Better Advanced Profiles (e.g., Bandwidth) Intel VTune Amplifier XE Easy to Use, Wealth of Data, Powerful Analysis 46

47 Low Overhead Java* Profiling Intel VTune Amplifier XE 2013 Intel VTune Amplifier XE Low Overhead & Precise Sampling is fast / unobtrusive Hardware sampling even faster (Now with optional stacks!) Advanced profiles are unique (cache misses, bandwidth ) Versatile & Easy to Use Multiple simultaneous JVMs Mixed Java / C++ / Fortran See results on the Java source Better Data, Lower Overhead, Easier to Use 47

48 CPU Power Analysis Intel VTune Amplifier XE 2013 Intel VTune Amplifier XE To decrease CPU power usage minimize wake-ups Identify wake-up causes Timers triggered by application Interrupts mapped to HW intr level Show wake-up rate Display source code for events that wake-up processor Show CPU frequencies by CPU core (CPU frequencies can change by CPU activity level) Linux only Select & filter to see a single wake up object: Uniquely Identifies the Cause of Wake-ups and Give Timer Call Stacks 48

49 Scale Forward 49

50 Simplify and Speed Threading Design Intel Advisor XE Threading Assistant Intel Advisor XE The Challenge of Parallel Design: Need to implement to measure performance Implementation is time consuming Disrupts regular product development Testing difficult without tools Intel Advisor XE Separates Design & Implementation Fast exploration of multiple options Find errors before implementation Design without disrupting development New! Linux* and Windows* New! C, C++, Fortran and C# code Add Parallelism with Less Effort, Less Risk and More Impact 50

51 Design Then Implement Intel Advisor XE 2013 Threading Assistant Intel Advisor XE Design Parallelism No disruption to regular development All test cases continue to work Tune and debug the design before you implement it 1) Analyze it. 2) Design it. (Compiler ignores these annotations.) 3) Tune it. 4) Check it. Implement Parallelism 5) Do it! Less Effort, Less Risk, More Impact 51

52 Scale Forward with Intel Parallel Models Extend to Intel Xeon Phi Coprocessors Compilers & Libraries Abstract, Scalable and Composable Intel Cilk Plus C/C++ language extensions to simplify parallelism Intel Threading Building Blocks Widely used C++ template library for thread management Open programming models and also Intel products Intel Xeon Processors, and Compatible Processors Intel Xeon Phi product family Support Standards OpenMP Coarray Fortran MPI Don t Leave Your Code Behind 52

53 Simplify Parallelism Intel Cilk Plus, Intel Threading Building Blocks Compilers & Libraries What Features Why Intel Cilk Plus Language extensions to simplify task/data parallelism 3 simple keywords & array notations for parallelism Support for task and data parallelism Semantics similar to serial code Simple way to parallelize your code Sequentially consistent, low overhead, powerful solution Supports C, C++, Windows and Linux Intel Threading Building Blocks Widely used C++ template library for task parallelism Parallel algorithms and data structures Scalable memory allocation and task scheduling Synchronization primitives Rich feature set for general purpose parallelism Available as open source or commercial license Supports C++, Windows, Linux, Mac OS X, other OSs Task and Data Parallelism Made Easier 53

54 Parallelize Applications For Performance Intel Threading Building Blocks (TBB) A popular, proven parallel C++ abstraction A C++ template library Scalable memory allocation Load-balancing Work-stealing task scheduling Thread-safe pipeline Flexible flow graph Concurrent containers High-level parallel algorithms Numerous synchronization primitives Open source, and portable across many OSs 54 "Intel TBB provided us with optimized code that we did not have to develop or maintain for critical system services. I could assign my developers to code what we bring to the software table Simplify Parallelism with a Scalable Parallel Model Michaël Rouillé, CTO, Golaem

55 Scale Forward and Extend to Intel Xeon Phi Coprocessors Intel Cilk Plus Intel Cilk Plus (Language Extension to C/C++) Easier Task & Data Parallelism 3 simple Keywords: cilk_for, cilk_spawn, cilk_sync Intel Cilk Plus Array Notation Save time with powerful vectorization Minimize Software Re-Work for New Hardware 55

56 Increase Reliability 56

57 Pointer Checker Intel Parallel Studio XE Intel Cluster Compiler Studio s & XE Libraries Finds buffer overflows and dangling pointers before memory corruption occurs Powerful error reporting Integrates into standard debuggers (Microsoft, gdb, Intel) Dangling pointer { } char *p, *q; p = malloc(10); q = p; free(p); *q = 0; Buffer Overflow { } char *my_chp = "abc"; char *an_chp = (char *) malloc (strlen((char *)my_chp)); memset (an_chp, '@', sizeof(my_chp)); CHKP: Bounds check error Traceback:./a.out(main+0x1b2) [0x402d7a] in file mems.c at line 13 Pointer Checker Highlights Programming Errors For More Secure Applications 57

58 Conditional Numerical Reproducibility Compilers & Libraries Intel Math Kernel Library: New deterministic task scheduling and code path selection options OpenMP*: New deterministic reduction option I m a C++ and Fortran developer and have high praise for the Intel Math Kernel Library. One nice feature I d like to stress is the numerical reproducibility of MKL which helps me get the assurance I need that I m getting the same floating point results from run to run." Intel Threading Building Blocks New parallel deterministic reduce option Help Achieve Reproducible Results, Despite Non-associative Floating Point Math 58 Franz Bernasek Owner / CEO, Senior Developer MSTC Modern Software Technology

59 Expanded C++ 11 support Compilers & Libraries Additional type traits Initializer lists (partial) Generalized constant expressions (partial) Noexcept (partial) Range based for loops Conversions of lambdas to function pointers Excellent Support for C++ 11 on Windows* and Linux* 59

60 Expanded Fortran 2008 Support Compilers & Libraries Maximum array rank has been raised to 31 dimensions (Fortran 2008 specifies 15) Recursive type may have ALLOCATABLE components Coarrays CODIMENSION attribute SYNC ALL statement SYNC IMAGES statement SYNC MEMORY statement CRITICAL and END CRITICAL statements LOCK and UNLOCK statements ERROR STOP statement ALLOCATE and DEALLOCATE may specify coarrays Intrinsic procedures IMAGE_INDEX, LCOBOUND, NUM_IMAGES, THIS_IMAGE, UCOBOUND CONTIGUOUS attribute MOLD keyword in ALLOCATE DO CONCURRENT G0 and G0.d format edit descriptor Unlimited format item repeat count specifier CONTAINS section may be empty Intrinsic procedures BESSEL_J0, BESSEL_J1, BESSEL_JN, BESSEL_YN, BGE, BGT, BLE, BLT, DSHIFTL, DSHIFTR, ERF, ERFC, ERFC_SCALED, GAMMA, HYPOT, IALL, IANY, IPARITY, IS_CONTIGUOUS, LEADZ, LOG_GAMMA, MASKL, MASKR, MERGE_BITS, NORM2, PARITY, POPCNT, POPPAR, SHIFTA, SHIFTL, SHIFTR, STORAGE_SIZE, TRAILZ Additions to intrinsic module ISO_FORTRAN_ENV: ATOMIC_INT_KIND, ATOMIC_LOGICAL_KIND, CHARACTER_KINDS, INTEGER_KINDS, INT8, INT16, INT32, INT64, LOCK_TYPE, LOGICAL_KINDS, REAL_KINDS, REAL32, REAL64, REAL128, STAT_LOCKED, STAT_LOCKED_OTHER_IMAGE, STAT_UNLOCKED NEWUNIT keyword in OPEN New: ATOMIC_DEFINE and ATOMIC_REF, initialization of polymorphic INTENT(OUT) dummy arguments, standard handling of G format and of printing the value zero, coarrays (more support), polymorphic source allocation Leadership F2008 Support on Linux*, Windows* & OSX* 60

61 Dynamic Analysis Finds Memory & Threading Errors Intel Inspector XE 2013 Intel Inspector XE Find and eliminate errors Memory leaks, invalid access Races & deadlocks Analyze hybrid MPI cluster apps Heap growth analysis Faster & Easier to use Debugger breakpoints Break on selected errors Run faster to known error Pause/resume collection Narrow analysis focus Better performance Improved error suppression Find Errors Early When They are Less Expensive 61

62 Heap Growth Analysis Intel Inspector XE 2013 Intel Inspector XE Does Application Memory Usage Mysteriously Grow? Set an analysis interval with start and analysis end points Click a button or Use an API See a list of memory allocations that are not freed in the interval Quickly zero in on suspicious activity that contributes to heap growth Speeds Diagnosis of Difficult to Find Heap Errors 62

63 Static Analysis Finds Coding and Security Errors Intel Parallel Studio XE 2013 Find over 250 error types e.g.: Incorrect directives Security errors Easier to use Choose your priority: - Minimize false errors - Maximize error detection Hierarchical navigation of results Share comments with the team Increased Accuracy & Speed Detect errors without all source files Better scaling with large code bases Code Complexity Metrics Find code likely to be less reliable Find Errors and Harden your Security Static Analysis is only available in Studio XE bundles. It is not sold separately. 63 Intel Parallel Studio XE Intel Cluster Compiler Studio s & XE Libraries

64 Cluster Tools 64

65 Scale Forward, Scale Faster Intel Cluster Tools Intel Compiler Cluster s Studio & Libraries XE Scale Performance Perform on More Nodes MPI Latency - Intel MPI Library - Up to 6.5X as fast as alternative MPI libraries Compiler Performance Industry leading Intel C/C++ & Fortran compilers Scale Forward multicore now, many-core ready Intel MPI Library scales beyond 120k processes Focused to preserve programming investments for multicore and many-core machines Scale Efficiently Tune & Debug on More Nodes Thread & Memory Correctness Checking Intel Inspector XE now MPI enabled across many nodes Rapid Node Level Performance Profiling Intel VTune Amplifier XE can identify hotspots faster and on thousands of nodes High Performance Standards Driven Fabric Flexible MPI Library 65 65

66 On the Path to Exascale Intel MPI Library and (part of Cluster Studio XE 2013) Intel MPI Library Latest hardware support Ivy Bridge and Haswell Intel Xeon Phi Coprocessor Processes K 60K 120K Intel MPI Library, K processes Doubling, K processes Increased Scaling 120k Processes Exascale, K processes (estimated ) Standards Support MPI Year Continued Scaling Capacity to Meet Ever Growing HPC Demands 66

67 Improved MPI Fault Tolerance Intel MPI Library Implementation of Berkeley Lab Checkpoint/Restart (BLCR) Primary Uses Scheduling Process Migration Failure Recovery Checkpointing Fault Recovery Scenario Node Fault Checkpoint Recovery Enabling Capabilities for Robust at Scale MPI Computing 67

68 MPI 2.2 Support Intel MPI Library Backwards compatible with MPI 2.1 programs Delivers Distributed Graph Topology Interface Scalable & Informative for MPI Library Communications Easy to Use Mechanism for Conveying Comms Patterns to MPI Applications Used by MPI Library to Improve Mapping Process to Process Communications Allows better fit for Applications Communications to Hardware Capabilities Outstanding Support Of The Latest MPI Standard 68

69 Optimize MPI Communications Intel Trace Analyzer and Collector (part of Cluster Studio XE 2013) Intel ITAC Visually understand parallel application behavior Communications Patterns Hotspots Load Balance MPI Checking Detect Deadlocks Data Corruption Errors in Parameters, Data Types, etc Scaling Analysis Capability increasing to 6k processes Processes Intel Trace Analyzer and Collector (processes) Year Expanding MPI Profiling Capacity for Communications Optimization 69

70 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #

71

72 72 Backup

73 Value of Suites Suite Only Features Advisor XE Parallelism Advice C++ Performance Guide Performance Wizard Pointer Checker Reduces memory corruption Code Complexity Analysis Find code likely to be less reliable Static Analysis Improved! Find Errors and Harden your Security 73

74 What s New in Libraries? Compiler s & Libraries Intel MKL Digital random number generator (DRNG) for improved vector statistics calculations Automatically utilize Intel Xeon Phi Coprocessors and balance compute loads between CPUs and coprocessors Intel IPP Enhanced image resize performance primitives Improved IPP footprint size Intel TBB "Intel TBB provided us with optimized code that we did not have to develop or maintain for critical system services. I could assign my developers to code what we bring to the software table crowd simulation software. Improved usability and reliability of the Flow Graph feature Additional C++11 Support Michaël Rouillé, CTO, Golaem Ready to Use Libraries to Increase Performance 74

75 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision # /2/2012

76

Three Questions every one keeps asking. Stephen Blair-Chappell Intel Compiler Labs

Three Questions every one keeps asking. Stephen Blair-Chappell Intel Compiler Labs Three Questions every one keeps asking Stephen Blair-Chappell Intel Compiler Labs Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any

More information

Growth in Cores - A well rehearsed story

Growth in Cores - A well rehearsed story Intel CPUs Growth in Cores - A well rehearsed story 2 1. Multicore is just a fad! Copyright 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

More information

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Using Intel VTune Amplifier XE and Inspector XE in.net environment Using Intel VTune Amplifier XE and Inspector XE in.net environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1 Refresher - Intel VTune Amplifier XE Intel Inspector

More information

Intel Parallel Studio XE 2015

Intel Parallel Studio XE 2015 2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:

More information

Graphics Performance Analyzer for Android

Graphics Performance Analyzer for Android Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent

More information

Memory & Thread Debugger

Memory & Thread Debugger Memory & Thread Debugger Here is What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action Intel Confidential 2 Analysis Tools for Diagnosis

More information

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor

More information

Using Intel VTune Amplifier XE for High Performance Computing

Using Intel VTune Amplifier XE for High Performance Computing Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message

More information

Jackson Marusarz Software Technical Consulting Engineer

Jackson Marusarz Software Technical Consulting Engineer Jackson Marusarz Software Technical Consulting Engineer What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action 2 Analysis Tools for Diagnosis

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

Intel Software Development Products Licensing & Programs Channel EMEA

Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Advanced Performance Distributed Performance Intel Software Development Products Foundation of

More information

Eliminate Threading Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability Introduction This guide will illustrate how the thread checking capabilities in Intel Parallel Studio XE can be used to find crucial threading defects early in the development cycle. It provides detailed

More information

Intel Parallel Studio 2011

Intel Parallel Studio 2011 THE ULTIMATE ALL-IN-ONE PERFORMANCE TOOLKIT Studio 2011 Product Brief Studio 2011 Accelerate Development of Reliable, High-Performance Serial and Threaded Applications for Multicore Studio 2011 is a comprehensive

More information

Eliminate Threading Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability Eliminate Threading Errors to Improve Program Stability This guide will illustrate how the thread checking capabilities in Parallel Studio can be used to find crucial threading defects early in the development

More information

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava,

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, Performance Profiler Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, 08-09-2016 Faster, Scalable Code, Faster Intel VTune Amplifier Performance Profiler Get Faster Code Faster With Accurate

More information

Overview of Intel Parallel Studio XE

Overview of Intel Parallel Studio XE Overview of Intel Parallel Studio XE Stephen Blair-Chappell 1 30-second pitch Intel Parallel Studio XE 2011 Advanced Application Performance What Is It? Suite of tools to develop high performing, robust

More information

A Simple Path to Parallelism with Intel Cilk Plus

A Simple Path to Parallelism with Intel Cilk Plus Introduction This introductory tutorial describes how to use Intel Cilk Plus to simplify making taking advantage of vectorization and threading parallelism in your code. It provides a brief description

More information

Intel System Studio 2014 Overview

Intel System Studio 2014 Overview Intel System Studio 2014 Overview What you will learn from this slide deck High level overview of each component for Intel System Studio, along with how they address these development environments System

More information

Eliminate Memory Errors to Improve Program Stability

Eliminate Memory Errors to Improve Program Stability Introduction INTEL PARALLEL STUDIO XE EVALUATION GUIDE This guide will illustrate how Intel Parallel Studio XE memory checking capabilities can find crucial memory defects early in the development cycle.

More information

What s New August 2015

What s New August 2015 What s New August 2015 Significant New Features New Directory Structure OpenMP* 4.1 Extensions C11 Standard Support More C++14 Standard Support Fortran 2008 Submodules and IMPURE ELEMENTAL Further C Interoperability

More information

Intel Software Development Products for High Performance Computing and Parallel Programming

Intel Software Development Products for High Performance Computing and Parallel Programming Intel Software Development Products for High Performance Computing and Parallel Programming Multicore development tools with extensions to many-core Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013 Achieving High Performance Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013 Does Instruction Set Matter? We find that ARM and x86 processors are simply engineering design points optimized

More information

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation S c i c o m P 2 0 1 3 T u t o r i a l Intel Xeon Phi Product Family Programming Tools Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation Agenda Intel Parallel Studio XE 2013

More information

Getting Started with Intel SDK for OpenCL Applications

Getting Started with Intel SDK for OpenCL Applications Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel

More information

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems. Introduction A resource leak refers to a type of resource consumption in which the program cannot release resources it has acquired. Typically the result of a bug, common resource issues, such as memory

More information

Intel VTune Amplifier XE

Intel VTune Amplifier XE Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance

More information

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP

More information

Intel Math Kernel Library 10.3

Intel Math Kernel Library 10.3 Intel Math Kernel Library 10.3 Product Brief Intel Math Kernel Library 10.3 The Flagship High Performance Computing Math Library for Windows*, Linux*, and Mac OS* X Intel Math Kernel Library (Intel MKL)

More information

Eliminate Memory Errors to Improve Program Stability

Eliminate Memory Errors to Improve Program Stability Eliminate Memory Errors to Improve Program Stability This guide will illustrate how Parallel Studio memory checking capabilities can find crucial memory defects early in the development cycle. It provides

More information

Intel Xeon Phi Coprocessor Performance Analysis

Intel Xeon Phi Coprocessor Performance Analysis Intel Xeon Phi Coprocessor Performance Analysis Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel

More information

Intel Xeon Phi Programmability (the good, the bad and the ugly)

Intel Xeon Phi Programmability (the good, the bad and the ugly) Intel Xeon Phi Programmability (the good, the bad and the ugly) Robert Geva Parallel Programming Models Architect My Perspective When the compiler can solve the problem When the programmer has to solve

More information

Intel Architecture and Tools Jureca Tuning for the platform II. Dr. Heinrich Bockhorst Intel SSG/DPD/ Date:

Intel Architecture and Tools Jureca Tuning for the platform II. Dr. Heinrich Bockhorst Intel SSG/DPD/ Date: Intel Architecture and Tools Jureca Tuning for the platform II Dr. Heinrich Bockhorst Intel SSG/DPD/ Date: 23.11.2017 Agenda Introduction Processor Architecture Overview Composer XE Compiler Intel Python

More information

Using Intel Inspector XE 2011 with Fortran Applications

Using Intel Inspector XE 2011 with Fortran Applications Using Intel Inspector XE 2011 with Fortran Applications Jackson Marusarz Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

extreme XQCD Bern Aug 5th, 2013 Edmund Preiss Manager Business Development, EMEA

extreme XQCD Bern Aug 5th, 2013 Edmund Preiss Manager Business Development, EMEA extreme XQCD Bern Aug 5th, 2013 Edmund Preiss Manager Business Development, EMEA Topics Covered Today 2 Intel s offerings to HPC Update on Intel Architecture Roadmap Overview on Intel Development Tools

More information

More performance options

More performance options More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel

More information

Intel VTune Amplifier XE. Dr. Michael Klemm Software and Services Group Developer Relations Division

Intel VTune Amplifier XE. Dr. Michael Klemm Software and Services Group Developer Relations Division Intel VTune Amplifier XE Dr. Michael Klemm Software and Services Group Developer Relations Division Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS

More information

Intel Advisor XE. Vectorization Optimization. Optimization Notice

Intel Advisor XE. Vectorization Optimization. Optimization Notice Intel Advisor XE Vectorization Optimization 1 Performance is a Proven Game Changer It is driving disruptive change in multiple industries Protecting buildings from extreme events Sophisticated mechanics

More information

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3

More information

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 Intel Debugger (IDB) is

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

Kevin O Leary, Intel Technical Consulting Engineer

Kevin O Leary, Intel Technical Consulting Engineer Kevin O Leary, Intel Technical Consulting Engineer Moore s Law Is Going Strong Hardware performance continues to grow exponentially We think we can continue Moore's Law for at least another 10 years."

More information

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend

More information

Achieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017

Achieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017 Achieving Peak Performance on Intel Hardware Intel Software Developer Conference London, 2017 Welcome Aims for the day You understand some of the critical features of Intel processors and other hardware

More information

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel

More information

Intel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel

Intel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Intel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Agenda Which performance analysis tool should I use first? Intel Application

More information

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Python Landscape Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#1:

More information

Vectorization Advisor: getting started

Vectorization Advisor: getting started Vectorization Advisor: getting started Before you analyze Run GUI or Command Line Set-up environment Linux: source /advixe-vars.sh Windows: \advixe-vars.bat Run GUI or Command

More information

Bei Wang, Dmitry Prohorov and Carlos Rosales

Bei Wang, Dmitry Prohorov and Carlos Rosales Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512

More information

LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015

LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015 LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015 Abstract Library for small matrix-matrix multiplications targeting

More information

What s P. Thierry

What s P. Thierry What s new@intel P. Thierry Principal Engineer, Intel Corp philippe.thierry@intel.com CPU trend Memory update Software Characterization in 30 mn 10 000 feet view CPU : Range of few TF/s and

More information

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012 Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2

More information

Microarchitectural Analysis with Intel VTune Amplifier XE

Microarchitectural Analysis with Intel VTune Amplifier XE Microarchitectural Analysis with Intel VTune Amplifier XE Michael Klemm Software & Services Group Developer Relations Division 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Revealing the performance aspects in your code

Revealing the performance aspects in your code Revealing the performance aspects in your code 1 Three corner stones of HPC The parallelism can be exploited at three levels: message passing, fork/join, SIMD Hyperthreading is not quite threading A popular

More information

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS

More information

Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism.

Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism. Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism. Parallel + SIMD is the Path Forward Intel Xeon and Intel Xeon Phi Product

More information

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward.

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward. High Performance Parallel Programming Multicore development tools with extensions to many-core. Investment protection. Scale Forward. Enabling & Advancing Parallelism High Performance Parallel Programming

More information

Simplified and Effective Serial and Parallel Performance Optimization

Simplified and Effective Serial and Parallel Performance Optimization HPC Code Modernization Workshop at LRZ Simplified and Effective Serial and Parallel Performance Optimization Performance tuning Using Intel VTune Performance Profiler Performance Tuning Methodology Goal:

More information

Intel Embedded Overview

Intel Embedded Overview Intel Embedded Overview 1 What you will learn from this slide deck Different segments Intel System Studio is useful for Common Hardware and Software challenges in developing for embedded Intel Architecture

More information

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...

More information

Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth

Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth Contents Intel Visual Fortran Compiler Professional Edition for Windows*........................ 3 Features...3 New in This

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel Clang * : An Excellent C++ Compiler LLVM * : Collection of modular and reusable compiler and toolchain technologies Created by Chris Lattner

More information

Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov

Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov What is the Parallel STL? C++17 C++ Next An extension of the C++ Standard Template Library algorithms with the execution policy argument Support for parallel

More information

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017 Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference London, 2017 Agenda Vectorization is becoming more and more important What is

More information

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Bitonic Sorting Intel OpenCL SDK Sample Documentation Intel OpenCL SDK Sample Documentation Document Number: 325262-002US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL

More information

Intel Cluster Checker 3.0 webinar

Intel Cluster Checker 3.0 webinar Intel Cluster Checker 3.0 webinar June 3, 2015 Christopher Heller Technical Consulting Engineer Q2, 2015 1 Introduction Intel Cluster Checker 3.0 is a systems tool for Linux high performance compute clusters

More information

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017 Becca Paren Cluster Systems Engineer Software and Services Group May 2017 Clusters are complex systems! Challenge is to reduce this complexity barrier for: Cluster architects System administrators Application

More information

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Document number: 323804-002US 21 June 2012 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.2 Product Contents...

More information

Intel PerfMon Performance Monitoring Hardware

Intel PerfMon Performance Monitoring Hardware Intel PerfMon Performance Monitoring Hardware Overview PerfMon Basics PerfMon is hardware throughout the silicon available through registers to tools to facilitate several system/application usages: compiler

More information

Expressing and Analyzing Dependencies in your C++ Application

Expressing and Analyzing Dependencies in your C++ Application Expressing and Analyzing Dependencies in your C++ Application Pablo Reble, Software Engineer Developer Products Division Software and Services Group, Intel Agenda TBB and Flow Graph extensions Composable

More information

The Intel Xeon Phi Coprocessor. Dr-Ing. Michael Klemm Software and Services Group Intel Corporation

The Intel Xeon Phi Coprocessor. Dr-Ing. Michael Klemm Software and Services Group Intel Corporation The Intel Xeon Phi Coprocessor Dr-Ing. Michael Klemm Software and Services Group Intel Corporation (michael.klemm@intel.com) Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

High Performance Computing The Essential Tool for a Knowledge Economy

High Performance Computing The Essential Tool for a Knowledge Economy High Performance Computing The Essential Tool for a Knowledge Economy Rajeeb Hazra Vice President & General Manager Technical Computing Group Datacenter & Connected Systems Group July 22 nd 2013 1 What

More information

Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth

Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Linux*.... 3 Intel C++ Compiler Professional Edition Components:......... 3 s...3

More information

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Document number: 323804-001US 8 October 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 What s New...

More information

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition for Linux*...3 Intel C++ Compiler Professional Edition Components:...3 Features...3 New

More information

Jim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen

Jim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen Jim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen Features We Discuss Synchronization (lock) hints The nonmonotonic:dynamic schedule Both Were new in OpenMP 4.5 May have slipped

More information

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document

More information

Oracle Developer Studio 12.6

Oracle Developer Studio 12.6 Oracle Developer Studio 12.6 Oracle Developer Studio is the #1 development environment for building C, C++, Fortran and Java applications for Oracle Solaris and Linux operating systems running on premises

More information

H.J. Lu, Sunil K Pandey. Intel. November, 2018

H.J. Lu, Sunil K Pandey. Intel. November, 2018 H.J. Lu, Sunil K Pandey Intel November, 2018 Issues with Run-time Library on IA Memory, string and math functions in today s glibc are optimized for today s Intel processors: AVX/AVX2/AVX512 FMA It takes

More information

IXPUG 16. Dmitry Durnov, Intel MPI team

IXPUG 16. Dmitry Durnov, Intel MPI team IXPUG 16 Dmitry Durnov, Intel MPI team Agenda - Intel MPI 2017 Beta U1 product availability - New features overview - Competitive results - Useful links - Q/A 2 Intel MPI 2017 Beta U1 is available! Key

More information

Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor Intel Xeon Phi Coprocessor http://tinyurl.com/inteljames twitter @jamesreinders James Reinders it s all about parallel programming Source Multicore CPU Compilers Libraries, Parallel Models Multicore CPU

More information

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title Programming for the Intel Many Integrated Core Architecture By James Reinders The Architecture for Discovery PowerPoint Title Intel Xeon Phi coprocessor 1. Designed for Highly Parallel workloads 2. and

More information

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization

More information

Intel Parallel Amplifier 2011

Intel Parallel Amplifier 2011 THREADING AND PERFORMANCE PROFILER Intel Parallel Amplifier 2011 Product Brief Intel Parallel Amplifier 2011 Optimize Performance and Scalability Intel Parallel Amplifier 2011 makes it simple to quickly

More information

Software Optimization Case Study. Yu-Ping Zhao

Software Optimization Case Study. Yu-Ping Zhao Software Optimization Case Study Yu-Ping Zhao Yuping.zhao@intel.com Agenda RELION Background RELION ITAC and VTUE Analyze RELION Auto-Refine Workload Optimization RELION 2D Classification Workload Optimization

More information

Bring your application to a new era:

Bring your application to a new era: Bring your application to a new era: learning by example how to parallelize and optimize for Intel Xeon processor and Intel Xeon Phi TM coprocessor Manel Fernández, Roger Philp, Richard Paul Bayncore Ltd.

More information

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Agenda Intel Advisor for vectorization

More information

Intel Visual Fortran Composer XE 2011 for Windows* Installation Guide and Release Notes

Intel Visual Fortran Composer XE 2011 for Windows* Installation Guide and Release Notes Intel Visual Fortran Composer XE 2011 for Windows* Installation Guide and Release Notes Document number: 321417-003US 14 June 2012 Table of Contents 1 Introduction... 4 1.1 Change History... 4 1.2 Product

More information

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino Performance analysis tools: Intel VTuneTM Amplifier and Advisor Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimisation After having considered the MPI layer,

More information

Intel Array Building Blocks

Intel Array Building Blocks Intel Array Building Blocks Productivity, Performance, and Portability with Intel Parallel Building Blocks Intel SW Products Workshop 2010 CERN openlab 11/29/2010 1 Agenda Legal Information Vision Call

More information

Parallel Programming. The Ultimate Road to Performance April 16, Werner Krotz-Vogel

Parallel Programming. The Ultimate Road to Performance April 16, Werner Krotz-Vogel Parallel Programming The Ultimate Road to Performance April 16, 2013 Werner Krotz-Vogel 1 Getting started with parallel algorithms Concurrency is a general concept multiple activities that can occur and

More information

Case Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing

Case Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing Case Study Software Optimizing an Illegal Image Filter System Intel Integrated Performance Primitives High-Performance Computing Tencent Doubles the Speed of its Illegal Image Filter System using SIMD

More information

Sergey Maidanov. Software Engineering Manager for Intel Distribution for Python*

Sergey Maidanov. Software Engineering Manager for Intel Distribution for Python* Sergey Maidanov Software Engineering Manager for Intel Distribution for Python* Introduction Python is among the most popular programming languages Especially for prototyping But very limited use in production

More information

Real World Development examples of systems / iot

Real World Development examples of systems / iot Real World Development examples of systems / iot Intel Software Developer Conference Seoul 2017 Jon Kim Software Consulting Engineer Contents IOT end-to-end Scalability with Intel x86 Architect Real World

More information

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

MICHAL MROZEK ZBIGNIEW ZDANOWICZ MICHAL MROZEK ZBIGNIEW ZDANOWICZ Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY

More information

Oracle Developer Studio Performance Analyzer

Oracle Developer Studio Performance Analyzer Oracle Developer Studio Performance Analyzer The Oracle Developer Studio Performance Analyzer provides unparalleled insight into the behavior of your application, allowing you to identify bottlenecks and

More information

Teaching Think Parallel

Teaching Think Parallel Teaching Think Parallel Four positive trends toward Parallel Programming, including advances in teaching/learning James Reinders, Intel April 2013 1 Tools for Parallel Programming Parallel Models Wildly

More information

Crosstalk between VMs. Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA

Crosstalk between VMs. Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA Crosstalk between VMs Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA 2 September 2015 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT

More information