Three Questions every one keeps asking. Stephen Blair-Chappell Intel Compiler Labs
|
|
- Alison Morrison
- 6 years ago
- Views:
Transcription
1 Three Questions every one keeps asking Stephen Blair-Chappell Intel Compiler Labs
2 Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any CPU? - compatibility 2 8/2/2012
3 Intel Parallel Studio XE Amplifier XE Profiler Composer XE Compiler Libraries Intel Composer XE Use to generate fast, safe, parallel code (C/C++, Fortran) Intel VTune Amplifier XE Find hotspots and bottlenecks in you code. Inspector XE Memory Errors Parallel Errors Intel Inspector XE Use to find memory and threading errors Three Components 3 8/2/2012
4 Intel Parallel Studio XE Amplifier XE Profiler Composer XE Compiler Libraries + Advisor Intel Parallel Advisor Use to model parallelism in your existing applications Inspector XE Memory Errors Parallel Errors Intel Composer XE Use to generate fast, safe, parallel code (C/C++, Fortran) Intel VTune Amplifier XE Find hotspots and bottlenecks in you code. Intel Inspector XE Use to find memory and threading errors Four Three Components 4 8/2/2012
5 Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any CPU? - compatibility 5 8/2/2012
6 The compiler uses many optimisation techniques Faster Code fast floating point /2/2012
7 Faster Code Often we are happy with out-ofthe-box experience When was the last time you looked at some documentation? 7
8 The Seven Optimisation Steps Step 1 start Build with optimization disabled Step 2 Use General Optimizations Step 3 Use Processor-Specific Options Step 4 Add Inter-procedural Step 5 Use Profile Guided Optimization Step 6 Tune automatic vectorization Step 7 Implement Parallelism or use Automatic Parallelism Example options Faster Code Windows (Linux) /Od (-O0) /01,/02,/03 (-O1, -O2, -O3) /QxSSE4.2 /QxHOST /Qipo /Qprof-gen /Qprof-use /Qguide (-xsse4.2) (-xhost) (-ipo) (-prof-gen) (-prof-use) (-guide) Use Intel Family of Parallel Models /Qparallel (-parallel)
9 Vectorisation is Faster Code \ SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AES-NI AVX AVX2 MIC 70 instr Single- Precision Vectors Streaming operations 144 instr Doubleprecision Vectors 8/16/32 64/128-bit vector integer 13 instr Complex Data 32 instr Decode 47 instr Video Graphics building blocks Advanced vector instr 8 instr String/XML processing POP-Count CRC 7 instr Encryption and Decryption Key Generation ~100 new instr. ~300 legacy sse instr updated 256-bit vector 3 and 4- operand instructions Int. AVX expands to 256 bit Improved bit manip. fma Vector shifts Gather 512-bit vector for (i=0;i<max;i++) c[i]=a[i]+b[i]; a[3] a[2] + + b[3] b[2] a[1] a[0] + + b[1] b[0] c[3] c[2] c[1] c[0] 9 8/2/2012
10 Different Ways of Inserting Vectorised Code Use Performance Libraries (e.g. IPP and MKL) Compiler: Fully automatic vectorization Cilk Plus Array Notation Compiler: Auto vectorization hints (#pragma ivdep, ) User Mandated Vectorization ( SIMD Directive) Manual CPU Dispatch ( declspec(cpu_dispatch )) SIMD intrinsic class (F32vec4 add) Vector intrinsic (mm_add_ps()) Assembler code (addps) Ease of use Programmer control Faster Code 10 8/2/2012
11 An example Faster Code Speedup by upgrading silicon Speedup by swapping compiler Verified using VTune 11 8/2/2012
12 Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any CPU? - compatibility 12 8/2/2012
13 Speedup using parallelism Parallel Code Analyze 1 Analyze Implement Amplifier XE Hotspot EBS (XE only) 2 Composer XE Four Step Development Implement Compiler Cilk Plus OpenMP Libraries MKL TBB IPP 3 Debug Inspector XE Threads Memory 4 Amplifier XE Tune concurrency Debug Tune Locks & waits 13 8/2/2012
14 Four Different Ways to Find the Hotspots 1. Using Intel compiler s loop profiler & profile viewer 2. Using the compiler s Auto-parallelizer Parallel Code Analyze Implement Debug Tune 3. Using Amplifier XE 4. Performing a Survey with Advisor 14 8/2/2012
15 Language to help parallelism Parallel Code Intel Cilk Plus OpenMP #pragma omp parallel for for(i=1;i<=4;i++) { printf( Iter: %d, i); } Intel Threading Building Blocks Intel MPI Fortran Coarrays OpenCL cilk_for (int i = 0; i < max_row; i++) { for (int j = 0; j < max_col; j++ ) { p[i][j] = mandel( complex(scale(i), scale(j))); } } Native Threads 15 8/2/2012
16 Four Different Ways to Find your Parallel Errors 1. Using Inspector XE Parallel Code Analyze Implement 2. Perform a Static Security Analysis 3. Debug with Parallel Debug Extensions 4. Use Advisor Debug Tune 16 8/2/2012
17 An example Parallel Code Hotspot Analysis 2. Implement 3. Find Threading Errors 4,5,6. Tune Parallelism 6 8/2/2012
18 Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any CPU? - compatibility 18 8/2/2012
19 Will my program run on any CPU? Compatible Code Compatibility run? Future Proofing OS-agnostic CPU-agnostic Language / Standards Tools Scalability build? Performance? 19 8/2/2012
20 Vectorised Parallel On the graphs, bigger is better 20 8/2/2012
21 Running Example: Monte Carlo #pragma omp parallel for for(int opt = 0; opt < OPT_N; opt++) { float VBySqrtT = VOLATILITY * sqrtf(t[opt]); float MuByT = (RISKFREE 0.5f * VOLATILITY * VOLATILITY) * T[opt]; float Sval = S[opt]; float Xval = X[opt]; float val = 0.0f, val2 = 0.0f; #pragma simd reduction(+:val) reduction(+:val2) for(int pos = 0; pos < RAND_N; pos++){ float callvalue = expectedcall(sval, Xval, MuByT, VBySqrtT, l_random[pos]); val += callvalue; val2 += callvalue * callvalue; } float exprt = expf( RISKFREE *T[opt]); h_callresult[opt] = exprt * val / (float)rand_n; float stddev = sqrtf(((float)rand_n*val2 val*val) / ((float)rand_n*(float)(rand_n 1.f))); h_callconfidence[opt] =(float)(exprt * 1.96f * stddev/sqrtf((float)rand_n)); } SFTL003 hands on lab
22 Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 Helping Developers Efficiently Produce Fast, Scalable and Reliable Applications
23 More Cores. Wider Vectors. Performance Delivered. Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 More Cores Multicore Many-core Scaling Performance Efficiently 50+ cores Serial Performance Wider Vectors 128 Bits 256 Bits 512 Bits Task & Data Parallel Performance Distributed Performance Industry-leading performance from advanced compilers Comprehensive libraries Parallel programming models Insightful analysis tools 23
24 Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 Phase Product Feature Benefit Intel Advisor XE Threading design assistant (Studio products only) Simplifies, demystifies, and speeds parallel application design Build Intel Composer XE C/C++ and Fortran compilers Intel Threading Building Blocks Intel Cilk Plus Intel Integrated Performance Primitives Intel Math Kernel Library Enabling solution to achieve the application performance and scalability benefits of multicore and forward scale to many-core Intel MPI Library High Performance Message Passing (MPI) Library Enabling High Performance Scalability, Interconnect Independence, Runtime Fabric Selection, and Application Tuning Capability Intel VTune Amplifier XE Performance Profiler for optimizing application performance and scalability Remove guesswork, saves time, makes it easier to find performance and scalability bottlenecks Verify & Tune Intel Inspector XE Memory & threading dynamic analysis for code quality Static Analysis for code quality Increased productivity, code quality, and lowers cost, finds memory, threading, and security defects before they happen Intel Trace Analyzer & Collector MPI Performance Profiler for understanding application correctness & behavior Analyze performance of MPI programs and visualize parallel application behavior and communications patterns to identify hotspots Efficiently Produce Fast, Scalable and Reliable Applications 24
25 Top New Features Performance Performance Profiling Reliability Reproducibility Standards Parallelism Assistance Improved compiler and library performance + Ivy Bridge microarchitecture + Haswell microarchitecture A dozen new analysis features Low overhead Java* profiling CPU Power Analysis Pointer checker Heap growth analysis Improved MPI fault tolerance Conditional numerical reproducibility Expanded C++ 11 Expanded Fortran 2008 MPI 2.2 Analysis extended to include Linux*, Fortran and C# (in addition to Windows* and C/C++) + Intel Xeon Phi coprocessor Intel Cluster Studio XE Efficiently produce fast, scalable and reliable applications running on Windows* and Linux* 25
26 The Build Environment Tool Target Macro MS Compiler st version STEP_0 ICC 13 1 Find Hotspot STEP_1 ICC 13 2 Add SSE Intrinsics STEP_2 Solver VTune 13 3 Find Hotspot STEP_3 ICC 13 4 Add OpenMP Code STEP_4 Inspector ICC Check Correctness Fix Correctness STEP_5 STEP_6 Generator Build example make 13-0 or nmake 13-0 VTune 13 7 Tune Parallelism STEP_7 Key ICC 13 8 Finish STEP_8 Serial Release Mode OpenMP Debug Mode OpenMP Release Mode 26 8/2/2012
27 How to Run 13-0.exe test.txt 27 8/2/2012
28 Your Challenge the hands-on Examine each of the eight stage and use a combination of the compiler, inspector, and amplifier to understand what s going on Answer these questions Is the application using the CPU at it s best? (Steps 0, 2 and 8) What s the biggest hotspot in the serial code? (steps 1 and 3) What errors were introduced into the parallelism? (Steps 4, 5 & 6) How well is the parallelism tuned? (Steps 7 & 8) Supplement: Why is the Linux version slower than the Windows Version? 28 8/2/2012
29 29 Thank You
30 30 Backup
31 Intel Parallel Studio XE Intel Cluster Studio XE (30 minutes) 31
32 Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 Helping Developers Efficiently Produce Fast, Scalable and Reliable Applications
33 More Cores. Wider Vectors. Performance Delivered. Intel Parallel Studio XE 2013 and Intel Cluster Studio XE 2013 More Cores Multicore Many-core Scaling Performance Efficiently 50+ cores Serial Performance Wider Vectors 128 Bits 256 Bits 512 Bits Task & Data Parallel Performance Distributed Performance Industry-leading performance from advanced compilers Comprehensive libraries Parallel programming models Insightful analysis tools 33
34 What s New? Intel Parallel Studio XE 2013/ Intel Cluster Studio XE 2013 Intel Parallel Studio XE Intel Cluster Compiler Studio s & XE Libraries Performance Leadership: 3rd Generation Intel Core Processors (code name Ivy Bridge ) and future Intel processors (code name Haswell ) Intel Xeon Phi coprocessors Improved C++ and Fortran performance New Product Capabilities Latest OS: Windows* 8 Desktop, Linux* IDE: Visual Studio 2008, 2010, 2012 and gnu tool chain Standards: C99, selected C++11 features, almost complete Fortran 2003 support and selected features from Fortran 2008, Fortran 2008, MPI
35 Boost Performance 35
36 Support for Latest Intel Processors and Coprocessors Intel Ivy Bridge microarchitecture Intel Haswell microarchitecture Intel Xeon Phi coprocessor Intel C++ and Fortran Compiler AVX AVX2, FMA3 IMCI Intel TBB library Intel MKL library AVX AVX2, FMA3 Intel MPI library Intel VTune Amplifier XE Hardware Events Hardware Events Hardware Events Intel Inspector XE Memory & Thread Checks Memory & Thread Memory & Thread Hardware events for new processors added as new processors ship. Analysis runs on multicore processors, provides analysis for multicore and many-core processors. 36
37 Performance-Oriented Compiler Suites Intel Compilers, Performance Libraries, Debugging Tools On Windows, Linux and Mac OS X Intel C++ Composer XE 2013 Intel C++ Compiler XE 13.0 with Intel Cilk Plus Intel TBB Intel MKL Intel IPP Intel Xeon Phi product family support, Linux Intel Fortran Composer XE 2013 Intel Fortran Compiler XE 13.0 Intel MKL Compatibility with Compaq Visual Fortran* Fortran 2003, 2008 support Intel Xeon Phi product family support, Linux Intel Composer XE 2013 Combines Intel C++ Composer XE and Intel Fortran Composer XE For Fortran developers who also want Intel C++ Windows (requires Visual Studio) and Linux only Windows: Intel C++/Visual* C++ compatibility & integration into Microsoft* Visual Studio* Linux: Intel C++/gcc* compatibility & integration into Eclipse* CDT Mac OS X: Intel C++/gcc compatibility & integration into XCode* Environment All: Intel Fortran performance leadership, compatible with Compaq* Visual* Fortran All: Leadership performance on Intel and compatible architectures All: One Year Intel Premier Support. Renewable Annually. Performance. Compatibility. Support. 37
38 Superior C++ Compiler Performance More Performance Just recompile Uses Intel AVX and Intel AVX2 instructions Intel Xeon Phi product family support, Linux: Compiler, debugger (Linux) Intel Cilk Plus: Tasking and vectorization 38
39 Superior Fortran Compiler Performance More Performance Just recompile Intel Xeon Phi product family: Linux compiler, debugger support Access to Intel AVX and Intel AVX2 instructions (-xa or /Qxa) Auto-parallelizer & directives to access SIMD instructions Coarrays & synchronization constructs support parallel programming Loop optimization directives: VECTOR, PARALLEL, SIMD More control over array data alignment (align arraynbytes) 39
40 C++ Performance Guide Performance Wizard for Windows Quick 5 step process for more performance Get help choosing optimization options Intel Parallel Studio XE Intel Cluster Compiler Studio s & XE Libraries Gain Performance with Less Effort 40
41 Intel Math Kernel Library (MKL) Highly optimized threaded math routines Applications in science, engineering, finance Use Intel MKL on Windows*, Linux*, Mac OS* Use Intel MKL with Intel compiler, gcc, MSFT*, PGI Component of Intel Parallel Studio XE and Intel Cluster Studio XE EDC North America Development Survey 2011, Volume II 33% of math libraries users rely on Intel s Math Kernel Library Drop In The Next Intel MKL Version to Unlock New Processor Performance 41
42 LAPACK Performance Improves with Intel Math Kernel Library Compilers & Libraries 42
43 Intel Integrated Performance Primitives (IPP) A Library Of Highly Optimized Algorithmic Building Blocks For Media And Data Applications Optimized for Performance and Power Efficiency Intel Engineered & Future Proofed to Save You Time Wide range of Cross Platform & OS Functionality Highly optimized using SSE, AVX instruction sets Performance beyond what an optimized compiler produces alone Ready-to-use & royalty free Fully optimized for current and past processors Save development, debug, and maintenance time Code once now, receive future optimizations later Thousands of optimized functions Supports Windows*, Linux*, and Mac OS* X Supports Intel Atom, Intel Core, Intel Xeon, platforms Availability: Part of several different product packages with single, multi-user licenses as well as volume, academic, and student discounts available. Try it Before You Buy It: Download a trial version today at intel.com/software/products/eval Performance Building Blocks to Make Your Applications Faster, Faster 43
44 Intel IPP Boost from Intel AVX 44
45 Intel VTune Amplifier XE Performance Profiler Where is my application Spending Time? Wasting Time? Waiting Too Long? Intel VTune Amplifier XE Focus tuning on functions taking time See call stacks See time on source See cache misses on your source See functions sorted by # of cache misses See locks by wait time Red/Green for CPU utilization during wait Windows & Linux Low overhead No special recompiles We improved the performance of the latest run 3 fold. We wouldn't have found the problem without something like Intel VTune Amplifier XE. Claire Cates Principal Developer, SAS Institute Inc. Advanced Profiling for Scalable Multicore Performance 45 45
46 A Dozen New Analysis Features Intel VTune Amplifier XE 2013 More Profiling Data 1) Statistical Call Counts Data for Inlining & Parallelization 2) Hardware Events + Stacks Lower overhead, Higher resolution Finds hot spots in small functions 3) Uncore Event Counting More accurate bandwidth analysis 4) Ivy Bridge Events 5) Haswell Events Updates as new processors ship 6) Intel Xeon Phi Products Hardware events Easier To Use 7) Source View for Inlined Code (For Intel and GCC* compilers) 8) Java Tuning Results map to the Java source 9) Task Annotation API Label and visualize tasks. 10) User Defined Metrics Create meaningful metrics from events 11) Programmable Hot Keys Start and stop collection easily 12) More/Better Advanced Profiles (e.g., Bandwidth) Intel VTune Amplifier XE Easy to Use, Wealth of Data, Powerful Analysis 46
47 Low Overhead Java* Profiling Intel VTune Amplifier XE 2013 Intel VTune Amplifier XE Low Overhead & Precise Sampling is fast / unobtrusive Hardware sampling even faster (Now with optional stacks!) Advanced profiles are unique (cache misses, bandwidth ) Versatile & Easy to Use Multiple simultaneous JVMs Mixed Java / C++ / Fortran See results on the Java source Better Data, Lower Overhead, Easier to Use 47
48 CPU Power Analysis Intel VTune Amplifier XE 2013 Intel VTune Amplifier XE To decrease CPU power usage minimize wake-ups Identify wake-up causes Timers triggered by application Interrupts mapped to HW intr level Show wake-up rate Display source code for events that wake-up processor Show CPU frequencies by CPU core (CPU frequencies can change by CPU activity level) Linux only Select & filter to see a single wake up object: Uniquely Identifies the Cause of Wake-ups and Give Timer Call Stacks 48
49 Scale Forward 49
50 Simplify and Speed Threading Design Intel Advisor XE Threading Assistant Intel Advisor XE The Challenge of Parallel Design: Need to implement to measure performance Implementation is time consuming Disrupts regular product development Testing difficult without tools Intel Advisor XE Separates Design & Implementation Fast exploration of multiple options Find errors before implementation Design without disrupting development New! Linux* and Windows* New! C, C++, Fortran and C# code Add Parallelism with Less Effort, Less Risk and More Impact 50
51 Design Then Implement Intel Advisor XE 2013 Threading Assistant Intel Advisor XE Design Parallelism No disruption to regular development All test cases continue to work Tune and debug the design before you implement it 1) Analyze it. 2) Design it. (Compiler ignores these annotations.) 3) Tune it. 4) Check it. Implement Parallelism 5) Do it! Less Effort, Less Risk, More Impact 51
52 Scale Forward with Intel Parallel Models Extend to Intel Xeon Phi Coprocessors Compilers & Libraries Abstract, Scalable and Composable Intel Cilk Plus C/C++ language extensions to simplify parallelism Intel Threading Building Blocks Widely used C++ template library for thread management Open programming models and also Intel products Intel Xeon Processors, and Compatible Processors Intel Xeon Phi product family Support Standards OpenMP Coarray Fortran MPI Don t Leave Your Code Behind 52
53 Simplify Parallelism Intel Cilk Plus, Intel Threading Building Blocks Compilers & Libraries What Features Why Intel Cilk Plus Language extensions to simplify task/data parallelism 3 simple keywords & array notations for parallelism Support for task and data parallelism Semantics similar to serial code Simple way to parallelize your code Sequentially consistent, low overhead, powerful solution Supports C, C++, Windows and Linux Intel Threading Building Blocks Widely used C++ template library for task parallelism Parallel algorithms and data structures Scalable memory allocation and task scheduling Synchronization primitives Rich feature set for general purpose parallelism Available as open source or commercial license Supports C++, Windows, Linux, Mac OS X, other OSs Task and Data Parallelism Made Easier 53
54 Parallelize Applications For Performance Intel Threading Building Blocks (TBB) A popular, proven parallel C++ abstraction A C++ template library Scalable memory allocation Load-balancing Work-stealing task scheduling Thread-safe pipeline Flexible flow graph Concurrent containers High-level parallel algorithms Numerous synchronization primitives Open source, and portable across many OSs 54 "Intel TBB provided us with optimized code that we did not have to develop or maintain for critical system services. I could assign my developers to code what we bring to the software table Simplify Parallelism with a Scalable Parallel Model Michaël Rouillé, CTO, Golaem
55 Scale Forward and Extend to Intel Xeon Phi Coprocessors Intel Cilk Plus Intel Cilk Plus (Language Extension to C/C++) Easier Task & Data Parallelism 3 simple Keywords: cilk_for, cilk_spawn, cilk_sync Intel Cilk Plus Array Notation Save time with powerful vectorization Minimize Software Re-Work for New Hardware 55
56 Increase Reliability 56
57 Pointer Checker Intel Parallel Studio XE Intel Cluster Compiler Studio s & XE Libraries Finds buffer overflows and dangling pointers before memory corruption occurs Powerful error reporting Integrates into standard debuggers (Microsoft, gdb, Intel) Dangling pointer { } char *p, *q; p = malloc(10); q = p; free(p); *q = 0; Buffer Overflow { } char *my_chp = "abc"; char *an_chp = (char *) malloc (strlen((char *)my_chp)); memset (an_chp, '@', sizeof(my_chp)); CHKP: Bounds check error Traceback:./a.out(main+0x1b2) [0x402d7a] in file mems.c at line 13 Pointer Checker Highlights Programming Errors For More Secure Applications 57
58 Conditional Numerical Reproducibility Compilers & Libraries Intel Math Kernel Library: New deterministic task scheduling and code path selection options OpenMP*: New deterministic reduction option I m a C++ and Fortran developer and have high praise for the Intel Math Kernel Library. One nice feature I d like to stress is the numerical reproducibility of MKL which helps me get the assurance I need that I m getting the same floating point results from run to run." Intel Threading Building Blocks New parallel deterministic reduce option Help Achieve Reproducible Results, Despite Non-associative Floating Point Math 58 Franz Bernasek Owner / CEO, Senior Developer MSTC Modern Software Technology
59 Expanded C++ 11 support Compilers & Libraries Additional type traits Initializer lists (partial) Generalized constant expressions (partial) Noexcept (partial) Range based for loops Conversions of lambdas to function pointers Excellent Support for C++ 11 on Windows* and Linux* 59
60 Expanded Fortran 2008 Support Compilers & Libraries Maximum array rank has been raised to 31 dimensions (Fortran 2008 specifies 15) Recursive type may have ALLOCATABLE components Coarrays CODIMENSION attribute SYNC ALL statement SYNC IMAGES statement SYNC MEMORY statement CRITICAL and END CRITICAL statements LOCK and UNLOCK statements ERROR STOP statement ALLOCATE and DEALLOCATE may specify coarrays Intrinsic procedures IMAGE_INDEX, LCOBOUND, NUM_IMAGES, THIS_IMAGE, UCOBOUND CONTIGUOUS attribute MOLD keyword in ALLOCATE DO CONCURRENT G0 and G0.d format edit descriptor Unlimited format item repeat count specifier CONTAINS section may be empty Intrinsic procedures BESSEL_J0, BESSEL_J1, BESSEL_JN, BESSEL_YN, BGE, BGT, BLE, BLT, DSHIFTL, DSHIFTR, ERF, ERFC, ERFC_SCALED, GAMMA, HYPOT, IALL, IANY, IPARITY, IS_CONTIGUOUS, LEADZ, LOG_GAMMA, MASKL, MASKR, MERGE_BITS, NORM2, PARITY, POPCNT, POPPAR, SHIFTA, SHIFTL, SHIFTR, STORAGE_SIZE, TRAILZ Additions to intrinsic module ISO_FORTRAN_ENV: ATOMIC_INT_KIND, ATOMIC_LOGICAL_KIND, CHARACTER_KINDS, INTEGER_KINDS, INT8, INT16, INT32, INT64, LOCK_TYPE, LOGICAL_KINDS, REAL_KINDS, REAL32, REAL64, REAL128, STAT_LOCKED, STAT_LOCKED_OTHER_IMAGE, STAT_UNLOCKED NEWUNIT keyword in OPEN New: ATOMIC_DEFINE and ATOMIC_REF, initialization of polymorphic INTENT(OUT) dummy arguments, standard handling of G format and of printing the value zero, coarrays (more support), polymorphic source allocation Leadership F2008 Support on Linux*, Windows* & OSX* 60
61 Dynamic Analysis Finds Memory & Threading Errors Intel Inspector XE 2013 Intel Inspector XE Find and eliminate errors Memory leaks, invalid access Races & deadlocks Analyze hybrid MPI cluster apps Heap growth analysis Faster & Easier to use Debugger breakpoints Break on selected errors Run faster to known error Pause/resume collection Narrow analysis focus Better performance Improved error suppression Find Errors Early When They are Less Expensive 61
62 Heap Growth Analysis Intel Inspector XE 2013 Intel Inspector XE Does Application Memory Usage Mysteriously Grow? Set an analysis interval with start and analysis end points Click a button or Use an API See a list of memory allocations that are not freed in the interval Quickly zero in on suspicious activity that contributes to heap growth Speeds Diagnosis of Difficult to Find Heap Errors 62
63 Static Analysis Finds Coding and Security Errors Intel Parallel Studio XE 2013 Find over 250 error types e.g.: Incorrect directives Security errors Easier to use Choose your priority: - Minimize false errors - Maximize error detection Hierarchical navigation of results Share comments with the team Increased Accuracy & Speed Detect errors without all source files Better scaling with large code bases Code Complexity Metrics Find code likely to be less reliable Find Errors and Harden your Security Static Analysis is only available in Studio XE bundles. It is not sold separately. 63 Intel Parallel Studio XE Intel Cluster Compiler Studio s & XE Libraries
64 Cluster Tools 64
65 Scale Forward, Scale Faster Intel Cluster Tools Intel Compiler Cluster s Studio & Libraries XE Scale Performance Perform on More Nodes MPI Latency - Intel MPI Library - Up to 6.5X as fast as alternative MPI libraries Compiler Performance Industry leading Intel C/C++ & Fortran compilers Scale Forward multicore now, many-core ready Intel MPI Library scales beyond 120k processes Focused to preserve programming investments for multicore and many-core machines Scale Efficiently Tune & Debug on More Nodes Thread & Memory Correctness Checking Intel Inspector XE now MPI enabled across many nodes Rapid Node Level Performance Profiling Intel VTune Amplifier XE can identify hotspots faster and on thousands of nodes High Performance Standards Driven Fabric Flexible MPI Library 65 65
66 On the Path to Exascale Intel MPI Library and (part of Cluster Studio XE 2013) Intel MPI Library Latest hardware support Ivy Bridge and Haswell Intel Xeon Phi Coprocessor Processes K 60K 120K Intel MPI Library, K processes Doubling, K processes Increased Scaling 120k Processes Exascale, K processes (estimated ) Standards Support MPI Year Continued Scaling Capacity to Meet Ever Growing HPC Demands 66
67 Improved MPI Fault Tolerance Intel MPI Library Implementation of Berkeley Lab Checkpoint/Restart (BLCR) Primary Uses Scheduling Process Migration Failure Recovery Checkpointing Fault Recovery Scenario Node Fault Checkpoint Recovery Enabling Capabilities for Robust at Scale MPI Computing 67
68 MPI 2.2 Support Intel MPI Library Backwards compatible with MPI 2.1 programs Delivers Distributed Graph Topology Interface Scalable & Informative for MPI Library Communications Easy to Use Mechanism for Conveying Comms Patterns to MPI Applications Used by MPI Library to Improve Mapping Process to Process Communications Allows better fit for Applications Communications to Hardware Capabilities Outstanding Support Of The Latest MPI Standard 68
69 Optimize MPI Communications Intel Trace Analyzer and Collector (part of Cluster Studio XE 2013) Intel ITAC Visually understand parallel application behavior Communications Patterns Hotspots Load Balance MPI Checking Detect Deadlocks Data Corruption Errors in Parameters, Data Types, etc Scaling Analysis Capability increasing to 6k processes Processes Intel Trace Analyzer and Collector (processes) Year Expanding MPI Profiling Capacity for Communications Optimization 69
70 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #
71
72 72 Backup
73 Value of Suites Suite Only Features Advisor XE Parallelism Advice C++ Performance Guide Performance Wizard Pointer Checker Reduces memory corruption Code Complexity Analysis Find code likely to be less reliable Static Analysis Improved! Find Errors and Harden your Security 73
74 What s New in Libraries? Compiler s & Libraries Intel MKL Digital random number generator (DRNG) for improved vector statistics calculations Automatically utilize Intel Xeon Phi Coprocessors and balance compute loads between CPUs and coprocessors Intel IPP Enhanced image resize performance primitives Improved IPP footprint size Intel TBB "Intel TBB provided us with optimized code that we did not have to develop or maintain for critical system services. I could assign my developers to code what we bring to the software table crowd simulation software. Improved usability and reliability of the Flow Graph feature Additional C++11 Support Michaël Rouillé, CTO, Golaem Ready to Use Libraries to Increase Performance 74
75 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision # /2/2012
76
Three Questions every one keeps asking. Stephen Blair-Chappell Intel Compiler Labs
Three Questions every one keeps asking Stephen Blair-Chappell Intel Compiler Labs Three Common Requests How can I make my program run faster? How can I make my program parallel? Will my code run on any
More informationGrowth in Cores - A well rehearsed story
Intel CPUs Growth in Cores - A well rehearsed story 2 1. Multicore is just a fad! Copyright 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
More informationUsing Intel VTune Amplifier XE and Inspector XE in.net environment
Using Intel VTune Amplifier XE and Inspector XE in.net environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1 Refresher - Intel VTune Amplifier XE Intel Inspector
More informationIntel Parallel Studio XE 2015
2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:
More informationGraphics Performance Analyzer for Android
Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent
More informationMemory & Thread Debugger
Memory & Thread Debugger Here is What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action Intel Confidential 2 Analysis Tools for Diagnosis
More informationIntel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant
Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor
More informationUsing Intel VTune Amplifier XE for High Performance Computing
Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message
More informationJackson Marusarz Software Technical Consulting Engineer
Jackson Marusarz Software Technical Consulting Engineer What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action 2 Analysis Tools for Diagnosis
More informationEfficiently Introduce Threading using Intel TBB
Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++
More informationIntel Software Development Products Licensing & Programs Channel EMEA
Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Advanced Performance Distributed Performance Intel Software Development Products Foundation of
More informationEliminate Threading Errors to Improve Program Stability
Introduction This guide will illustrate how the thread checking capabilities in Intel Parallel Studio XE can be used to find crucial threading defects early in the development cycle. It provides detailed
More informationIntel Parallel Studio 2011
THE ULTIMATE ALL-IN-ONE PERFORMANCE TOOLKIT Studio 2011 Product Brief Studio 2011 Accelerate Development of Reliable, High-Performance Serial and Threaded Applications for Multicore Studio 2011 is a comprehensive
More informationEliminate Threading Errors to Improve Program Stability
Eliminate Threading Errors to Improve Program Stability This guide will illustrate how the thread checking capabilities in Parallel Studio can be used to find crucial threading defects early in the development
More informationPerformance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava,
Performance Profiler Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, 08-09-2016 Faster, Scalable Code, Faster Intel VTune Amplifier Performance Profiler Get Faster Code Faster With Accurate
More informationOverview of Intel Parallel Studio XE
Overview of Intel Parallel Studio XE Stephen Blair-Chappell 1 30-second pitch Intel Parallel Studio XE 2011 Advanced Application Performance What Is It? Suite of tools to develop high performing, robust
More informationA Simple Path to Parallelism with Intel Cilk Plus
Introduction This introductory tutorial describes how to use Intel Cilk Plus to simplify making taking advantage of vectorization and threading parallelism in your code. It provides a brief description
More informationIntel System Studio 2014 Overview
Intel System Studio 2014 Overview What you will learn from this slide deck High level overview of each component for Intel System Studio, along with how they address these development environments System
More informationEliminate Memory Errors to Improve Program Stability
Introduction INTEL PARALLEL STUDIO XE EVALUATION GUIDE This guide will illustrate how Intel Parallel Studio XE memory checking capabilities can find crucial memory defects early in the development cycle.
More informationWhat s New August 2015
What s New August 2015 Significant New Features New Directory Structure OpenMP* 4.1 Extensions C11 Standard Support More C++14 Standard Support Fortran 2008 Submodules and IMPURE ELEMENTAL Further C Interoperability
More informationIntel Software Development Products for High Performance Computing and Parallel Programming
Intel Software Development Products for High Performance Computing and Parallel Programming Multicore development tools with extensions to many-core Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN
More informationAchieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013
Achieving High Performance Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013 Does Instruction Set Matter? We find that ARM and x86 processors are simply engineering design points optimized
More informationKlaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation
S c i c o m P 2 0 1 3 T u t o r i a l Intel Xeon Phi Product Family Programming Tools Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation Agenda Intel Parallel Studio XE 2013
More informationGetting Started with Intel SDK for OpenCL Applications
Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel
More informationThis guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.
Introduction A resource leak refers to a type of resource consumption in which the program cannot release resources it has acquired. Typically the result of a bug, common resource issues, such as memory
More informationIntel VTune Amplifier XE
Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance
More informationAgenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP
More informationIntel Math Kernel Library 10.3
Intel Math Kernel Library 10.3 Product Brief Intel Math Kernel Library 10.3 The Flagship High Performance Computing Math Library for Windows*, Linux*, and Mac OS* X Intel Math Kernel Library (Intel MKL)
More informationEliminate Memory Errors to Improve Program Stability
Eliminate Memory Errors to Improve Program Stability This guide will illustrate how Parallel Studio memory checking capabilities can find crucial memory defects early in the development cycle. It provides
More informationIntel Xeon Phi Coprocessor Performance Analysis
Intel Xeon Phi Coprocessor Performance Analysis Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
More informationInstallation Guide and Release Notes
Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel
More informationIntel Xeon Phi Programmability (the good, the bad and the ugly)
Intel Xeon Phi Programmability (the good, the bad and the ugly) Robert Geva Parallel Programming Models Architect My Perspective When the compiler can solve the problem When the programmer has to solve
More informationIntel Architecture and Tools Jureca Tuning for the platform II. Dr. Heinrich Bockhorst Intel SSG/DPD/ Date:
Intel Architecture and Tools Jureca Tuning for the platform II Dr. Heinrich Bockhorst Intel SSG/DPD/ Date: 23.11.2017 Agenda Introduction Processor Architecture Overview Composer XE Compiler Intel Python
More informationUsing Intel Inspector XE 2011 with Fortran Applications
Using Intel Inspector XE 2011 with Fortran Applications Jackson Marusarz Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationextreme XQCD Bern Aug 5th, 2013 Edmund Preiss Manager Business Development, EMEA
extreme XQCD Bern Aug 5th, 2013 Edmund Preiss Manager Business Development, EMEA Topics Covered Today 2 Intel s offerings to HPC Update on Intel Architecture Roadmap Overview on Intel Development Tools
More informationMore performance options
More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel
More informationIntel VTune Amplifier XE. Dr. Michael Klemm Software and Services Group Developer Relations Division
Intel VTune Amplifier XE Dr. Michael Klemm Software and Services Group Developer Relations Division Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS
More informationIntel Advisor XE. Vectorization Optimization. Optimization Notice
Intel Advisor XE Vectorization Optimization 1 Performance is a Proven Game Changer It is driving disruptive change in multiple industries Protecting buildings from extreme events Sophisticated mechanics
More informationIntel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth
Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3
More informationIntel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes
Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 Intel Debugger (IDB) is
More informationIntel Many Integrated Core (MIC) Architecture
Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products
More informationKevin O Leary, Intel Technical Consulting Engineer
Kevin O Leary, Intel Technical Consulting Engineer Moore s Law Is Going Strong Hardware performance continues to grow exponentially We think we can continue Moore's Law for at least another 10 years."
More informationMunara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.
Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend
More informationAchieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017
Achieving Peak Performance on Intel Hardware Intel Software Developer Conference London, 2017 Welcome Aims for the day You understand some of the critical features of Intel processors and other hardware
More informationAlexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria
Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media
More informationInstallation Guide and Release Notes
Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel
More informationIntel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel
Intel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Agenda Which performance analysis tool should I use first? Intel Application
More informationIntel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python
Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Python Landscape Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#1:
More informationVectorization Advisor: getting started
Vectorization Advisor: getting started Before you analyze Run GUI or Command Line Set-up environment Linux: source /advixe-vars.sh Windows: \advixe-vars.bat Run GUI or Command
More informationBei Wang, Dmitry Prohorov and Carlos Rosales
Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512
More informationLIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015
LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015 Abstract Library for small matrix-matrix multiplications targeting
More informationWhat s P. Thierry
What s new@intel P. Thierry Principal Engineer, Intel Corp philippe.thierry@intel.com CPU trend Memory update Software Characterization in 30 mn 10 000 feet view CPU : Range of few TF/s and
More informationParallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012
Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2
More informationMicroarchitectural Analysis with Intel VTune Amplifier XE
Microarchitectural Analysis with Intel VTune Amplifier XE Michael Klemm Software & Services Group Developer Relations Division 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationRevealing the performance aspects in your code
Revealing the performance aspects in your code 1 Three corner stones of HPC The parallelism can be exploited at three levels: message passing, fork/join, SIMD Hyperthreading is not quite threading A popular
More informationIntel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor
Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS
More informationCode modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism.
Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism. Parallel + SIMD is the Path Forward Intel Xeon and Intel Xeon Phi Product
More informationHigh Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward.
High Performance Parallel Programming Multicore development tools with extensions to many-core. Investment protection. Scale Forward. Enabling & Advancing Parallelism High Performance Parallel Programming
More informationSimplified and Effective Serial and Parallel Performance Optimization
HPC Code Modernization Workshop at LRZ Simplified and Effective Serial and Parallel Performance Optimization Performance tuning Using Intel VTune Performance Profiler Performance Tuning Methodology Goal:
More informationIntel Embedded Overview
Intel Embedded Overview 1 What you will learn from this slide deck Different segments Intel System Studio is useful for Common Hardware and Software challenges in developing for embedded Intel Architecture
More informationIntel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes
Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...
More informationIntel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth
Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth Contents Intel Visual Fortran Compiler Professional Edition for Windows*........................ 3 Features...3 New in This
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationOpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel
OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel Clang * : An Excellent C++ Compiler LLVM * : Collection of modular and reusable compiler and toolchain technologies Created by Chris Lattner
More informationMikhail Dvorskiy, Jim Cownie, Alexey Kukanov
Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov What is the Parallel STL? C++17 C++ Next An extension of the C++ Standard Template Library algorithms with the execution policy argument Support for parallel
More informationVisualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017
Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference London, 2017 Agenda Vectorization is becoming more and more important What is
More informationBitonic Sorting Intel OpenCL SDK Sample Documentation
Intel OpenCL SDK Sample Documentation Document Number: 325262-002US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL
More informationIntel Cluster Checker 3.0 webinar
Intel Cluster Checker 3.0 webinar June 3, 2015 Christopher Heller Technical Consulting Engineer Q2, 2015 1 Introduction Intel Cluster Checker 3.0 is a systems tool for Linux high performance compute clusters
More informationBecca Paren Cluster Systems Engineer Software and Services Group. May 2017
Becca Paren Cluster Systems Engineer Software and Services Group May 2017 Clusters are complex systems! Challenge is to reduce this complexity barrier for: Cluster architects System administrators Application
More informationIntel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes
Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Document number: 323804-002US 21 June 2012 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.2 Product Contents...
More informationIntel PerfMon Performance Monitoring Hardware
Intel PerfMon Performance Monitoring Hardware Overview PerfMon Basics PerfMon is hardware throughout the silicon available through registers to tools to facilitate several system/application usages: compiler
More informationExpressing and Analyzing Dependencies in your C++ Application
Expressing and Analyzing Dependencies in your C++ Application Pablo Reble, Software Engineer Developer Products Division Software and Services Group, Intel Agenda TBB and Flow Graph extensions Composable
More informationThe Intel Xeon Phi Coprocessor. Dr-Ing. Michael Klemm Software and Services Group Intel Corporation
The Intel Xeon Phi Coprocessor Dr-Ing. Michael Klemm Software and Services Group Intel Corporation (michael.klemm@intel.com) Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED
More informationHigh Performance Computing The Essential Tool for a Knowledge Economy
High Performance Computing The Essential Tool for a Knowledge Economy Rajeeb Hazra Vice President & General Manager Technical Computing Group Datacenter & Connected Systems Group July 22 nd 2013 1 What
More informationIntel C++ Compiler Professional Edition 11.1 for Linux* In-Depth
Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Linux*.... 3 Intel C++ Compiler Professional Edition Components:......... 3 s...3
More informationIntel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes
Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Document number: 323804-001US 8 October 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 What s New...
More informationIntel C++ Compiler Professional Edition 11.0 for Linux* In-Depth
Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition for Linux*...3 Intel C++ Compiler Professional Edition Components:...3 Features...3 New
More informationJim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen
Jim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen Features We Discuss Synchronization (lock) hints The nonmonotonic:dynamic schedule Both Were new in OpenMP 4.5 May have slipped
More informationBitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved
Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document
More informationOracle Developer Studio 12.6
Oracle Developer Studio 12.6 Oracle Developer Studio is the #1 development environment for building C, C++, Fortran and Java applications for Oracle Solaris and Linux operating systems running on premises
More informationH.J. Lu, Sunil K Pandey. Intel. November, 2018
H.J. Lu, Sunil K Pandey Intel November, 2018 Issues with Run-time Library on IA Memory, string and math functions in today s glibc are optimized for today s Intel processors: AVX/AVX2/AVX512 FMA It takes
More informationIXPUG 16. Dmitry Durnov, Intel MPI team
IXPUG 16 Dmitry Durnov, Intel MPI team Agenda - Intel MPI 2017 Beta U1 product availability - New features overview - Competitive results - Useful links - Q/A 2 Intel MPI 2017 Beta U1 is available! Key
More informationIntel Xeon Phi Coprocessor
Intel Xeon Phi Coprocessor http://tinyurl.com/inteljames twitter @jamesreinders James Reinders it s all about parallel programming Source Multicore CPU Compilers Libraries, Parallel Models Multicore CPU
More informationProgramming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title
Programming for the Intel Many Integrated Core Architecture By James Reinders The Architecture for Discovery PowerPoint Title Intel Xeon Phi coprocessor 1. Designed for Highly Parallel workloads 2. and
More informationIFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor
IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization
More informationIntel Parallel Amplifier 2011
THREADING AND PERFORMANCE PROFILER Intel Parallel Amplifier 2011 Product Brief Intel Parallel Amplifier 2011 Optimize Performance and Scalability Intel Parallel Amplifier 2011 makes it simple to quickly
More informationSoftware Optimization Case Study. Yu-Ping Zhao
Software Optimization Case Study Yu-Ping Zhao Yuping.zhao@intel.com Agenda RELION Background RELION ITAC and VTUE Analyze RELION Auto-Refine Workload Optimization RELION 2D Classification Workload Optimization
More informationBring your application to a new era:
Bring your application to a new era: learning by example how to parallelize and optimize for Intel Xeon processor and Intel Xeon Phi TM coprocessor Manel Fernández, Roger Philp, Richard Paul Bayncore Ltd.
More informationVisualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature
Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Agenda Intel Advisor for vectorization
More informationIntel Visual Fortran Composer XE 2011 for Windows* Installation Guide and Release Notes
Intel Visual Fortran Composer XE 2011 for Windows* Installation Guide and Release Notes Document number: 321417-003US 14 June 2012 Table of Contents 1 Introduction... 4 1.1 Change History... 4 1.2 Product
More informationPerformance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino
Performance analysis tools: Intel VTuneTM Amplifier and Advisor Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimisation After having considered the MPI layer,
More informationIntel Array Building Blocks
Intel Array Building Blocks Productivity, Performance, and Portability with Intel Parallel Building Blocks Intel SW Products Workshop 2010 CERN openlab 11/29/2010 1 Agenda Legal Information Vision Call
More informationParallel Programming. The Ultimate Road to Performance April 16, Werner Krotz-Vogel
Parallel Programming The Ultimate Road to Performance April 16, 2013 Werner Krotz-Vogel 1 Getting started with parallel algorithms Concurrency is a general concept multiple activities that can occur and
More informationCase Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing
Case Study Software Optimizing an Illegal Image Filter System Intel Integrated Performance Primitives High-Performance Computing Tencent Doubles the Speed of its Illegal Image Filter System using SIMD
More informationSergey Maidanov. Software Engineering Manager for Intel Distribution for Python*
Sergey Maidanov Software Engineering Manager for Intel Distribution for Python* Introduction Python is among the most popular programming languages Especially for prototyping But very limited use in production
More informationReal World Development examples of systems / iot
Real World Development examples of systems / iot Intel Software Developer Conference Seoul 2017 Jon Kim Software Consulting Engineer Contents IOT end-to-end Scalability with Intel x86 Architect Real World
More informationMICHAL MROZEK ZBIGNIEW ZDANOWICZ
MICHAL MROZEK ZBIGNIEW ZDANOWICZ Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
More informationOracle Developer Studio Performance Analyzer
Oracle Developer Studio Performance Analyzer The Oracle Developer Studio Performance Analyzer provides unparalleled insight into the behavior of your application, allowing you to identify bottlenecks and
More informationTeaching Think Parallel
Teaching Think Parallel Four positive trends toward Parallel Programming, including advances in teaching/learning James Reinders, Intel April 2013 1 Tools for Parallel Programming Parallel Models Wildly
More informationCrosstalk between VMs. Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA
Crosstalk between VMs Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA 2 September 2015 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT
More information