Intel VTune Amplifier XE Overview
|
|
- Alvin Thornton
- 5 years ago
- Views:
Transcription
1 Intel VTune Amplifier XE Overview June
2 Intel Parallel Studio XE 2011 Phase Productivity Tool Feature Benefit Advanced Build & Debug Intel Composer XE C/C++ and Fortran compilers, performance libraries,and parallel models Driving application performance and scalability benefits of multicoreand forward scale to manycore. Additionally providingcode robustness and security. Advanced Verify Intel Inspector XE Memory & threading error checking tool for higher code reliability & quality Increases productivity and lowers cost, by catching memory and threading defects early Advanced Tune Intel VTune TM Amplifier XE Performance Profiler to optimize performance and scalability Removes guesswork, saves time, makes it easier to find performance and scalability bottlenecks Combines ease of use with deeper insights. 11/22/2011 2
3 Available Tools and Tools Evolution Performance Analysis Tools + Intel VTune Amplifier XE Linux&Windows GUI/CLI support Tune Analyze and optimize performance issues PIN, Statistical Call Graph, Memory Checking, Other features, GUI Usability Some features will be incorporated into future mainstream products Mainstream Windows Visual Studio only 3
4 Intel Vtune Amplifier XE Integrates popular and mature features of Intel VTune Performance Analyzer, Intel Parallel Amplifier, Intel Thread Profiler and Intel Performance Tuning Utility But not a super-set in all cases Some additional features being worked on and will be added later; some are still being evaluated/might be added to future updates Standalone GUI on Linux* and Windows* GUI in all environments based on wxwidgets: Very fast and stable Same look-and-feel for Linux & Windows Fast and native implementation on Linux No sluggish and fragile emulations anymore!! Comprehensive Command Line interface New instrumentation technologies for data collection 4
5 Intel Vtune Amplifier XE Feature Highlights Ease of use is key focus Simple configuration of analysis session Copy and modify existing analysis types to adapt to special needs Intuitive filtering and display of data collected Stand-alone GUI but also seamless integration into Microsoft Visual Studio* on Windows* Extended Platform Coverage Windows* and Linux Microsoft*.NET* C# applications Advanced Source / Assembler View Analysis / event data mapped to the source / assembler code View and analyze assembler code as basic blocks 5
6 All available analysis types Different ways to start the analysis Helps creating new analysis types Copy the commandline to clipboard 6
7 Supported Analysis Techniques Hot Spot Analysis Where is the execution time spend? Includes statistical call graph Minimal intrusiveness due to statistical data collection Event Based Sampling (EBS): Where are architectural events like cache misses happening? Performance Event configuration Event grouping Thread Profiling Where is my concurrency poor and why? Which time do I spend in locks? Thread timeline visualizes thread activity and lock transitions Optional integrates EBS data collection and view 7
8 Data Collection Techniques EBS: Similar to Intel PTU, based on SEP (Sampling Enabling Platform Hotspot Analysis: User-mode sampling (aka stack sampling ) Concurrency Analysis: Dynamic binary instrumentation of PIN Only low-intrusive Probe-Mode ; slow JIT-Mode not used by performance tools Source modification/recompilation is not needed VTune Analyzer, PTU Parallel Amplifier Improved Event based sampling (PMU) PIN for Instrumentation, TBS for hotspot and stack sampling) Intel(R) VTune(TM) Amplifier XE 8
9 Hotspot Analysis For Hotspots analysis, only stack sampling is performed For each sample, capture execution context, time passed since previous sample and thread CPU time No specific filtering is done for Hotspots. The amount of data collected in Hotspots is rather moderate Screenshots are subject to change 9
10 Hotspot Analysis Summary View 10
11 Hotspot Result View incl. Timeline Callstack Threads Timeline 11
12 Parallelism/Concurrency Analysis For Parallelism / Concurrency analysis, Stack sampling is done just like in Hotspots analysis Wait functions are instrumented (e.g. WaitForSingleObject, EnterCriticalSection) Signal functions are instrumented (e.g. SetEvent, LeaveCriticalSection) I/O functions are instrumented (e.g. ReadFile, socket) 12
13 Parallelism/Concurrency Analysis Summary View 13
14 Parallelism/Concurrency Analysis 14
15 Event Based Sampling 15
16 Event Based Sampling Pre-defined event sets for supported processors Top level Triage Cache analysis and false sharing Branching issues Long-latency computation operations Structural hazards Working set size Data access patterns Memory latency Memory bandwidth Event multiplexing Pre-defined displays (viewpoints) transform data into information Numerous views, sophisticated filtering combining best we learned from EBS in Intel Vtune Perf. Analyzer and Intel PTU Cycle accounting analysis methodology 16 16
17 Introduction to Core architecture Execution Unit: Port Mapping Reservation station (36 entries) Port 0 Port 1 Port 5 Port 2 Port 3 Port 4 INT SIMD ALU 1 SHIFT 1 LEA SIALU 1 SISHIFT ALU 2 ALU 3 IMUL SHIFT 2 JEU SIMUL SIALU 2 SISHUF Resul lt Bus LOAD AGU LB STORE AGU SB 48 entries 32 entries DTLB STORE DATA (MIU) PMH MOB FMUL FADD FP FDIV FSHUF FPREM L1-D$ L2$/LL$ ROB
18 AnalysingExecution Unit Stage Divider as Sample The Divider is a big potential stall source Very high latency; almost no pipelining Event : DIV Counts the number of divide operations executed Event : RESOURCE_STALLS.DIV_BUSY Counts the number of cycles for which a micro-op stalls at issue due to div busy Try to find some useful work to do in parallel with divide operations Nehalem Penryn Core 2 Lat Through Latency Through Latency ency put put Through put DIVSD MULSD
19 Intel(R) VTune(TM) Amplifier XE Viewpoints and Groupings Groupings Each analysis types have pre-defined groupings Different groupings allow users analyze data in different ways with different focus in mind 19
20 Frame Analysis There are two ways to support frames and frame analysis Explicit frame marker by program APIs Recommend approach for version 1.0 Check out <Vtune Amplifier Install Dir>\include\ittnotify.h typedef struct itt_frame_t * itt_frame; itt_frame ITTAPI itt_frame_createa(const char *domain); itt_frame ITTAPI itt_frame_createw(const wchar_t *domain); void ITTAPI itt_frame_begin( itt_frame frame); void ITTAPI itt_frame_end ( itt_frame frame); Automated Frame detection using knowledge about graphic API (like DirectX) Prototype available; no final decision on whether it will make it to initial product release already 20
21 Frame Analysis using APIs itt_frame = itt_frame_createa("simdomain"); while(grunning) { itt_frame_begin(itt_frame); } start = clock(); //Wait all threads before moving into the next frame WaitForMultipleObjects(FUNCTIONAL_DOMAINS, esignal, TRUE, INFINITE); stop = clock(); //Give all threads the "go" signal for (int i = 0; i < FUNCTIONAL_DOMAINS; i++) SetEvent(bSignal[i]); if (frame % NETWORKCONNETION_FREQ == 0) { //Start network thread SetEvent(bNetSignal); } itt_frame_end(itt_frame); 21
22 Frame Analysis in GUI Frame boundaries 22
23 User Events Check out <Vtune Amplifier Install Dir>\include\ittnotify.h typedef int itt_event; itt_event LIBITTAPI itt_event_createa(const char *name, int namelen); itt_event LIBITTAPI itt_event_createw(const wchar_t *name, int namelen); int LIBITTAPI itt_event_start( itt_event event); int LIBITTAPI itt_event_end( itt_event event); DWORD WINAPI aiwork(lpvoid lparg) { int tid = *((int*)lparg); itt_event aievent; aievent = itt_event_createa("ai Thread Work",14); } while(grunning) { WaitForSingleObject(bSignal[tid], INFINITE); itt_event_start(aievent); dosomedataparallelwork(); itt_event_end(aievent); SetEvent(eSignal[tid]); } return 0; 23
24 User Events in GUI [1] User defined task User defined task User defined task 24
25 Intel(R) VTune(TM) Amplifier XE Command Line Interface Examples D:\Examples\>"c:\Program Files (x86)\intel\amplifier XE\bin32\amplxe-cl.exe" Command Line Tool Copyright (C) Intel Corporation. All rights reserved Usage amplxe-cl <-action-option> [-modifier-option] [[--] target [target options]] Where: action-option is one of the following: collect, collect-list, command, finalize, help, import, knob-list, report, report-list, version modifier-option is one or more of the following: allow-multiple-runs, callee-attribution-mode, csv-delimiter, cumulative-threshold-percent, data-limit, [no-]discard-raw-data, quiet, duration, filter, [no-]follow-child, format, group-by, knob, limit, mrte-mode, report-output, result-dir, resume-after, search-dir, start-paused, target-duration-type, target-pid, target-process, user-data-dir, verbose To view the results in the IDE, double-click the <resultname>.amplxe file located in the result directory. 25
26 Intel(R) VTune(TM) Amplifier XE Command Line Interface Examples Display a list of available analysis types and preset configuration levels amplxe-cl collect-list Run Hot Spot analysis on target myappand store result in default-named directory, such as r000hs inspxe-cl c hotspots-- myapp Run the Parallelism analysis, store the result in directory r001par amplxe-cl -c parallelism -result-dir r001par -- myapp 26
27 Intel(R) VTune(TM) Amplifier XE and Legacy Tools: Feature Comparison 27
28 Legacy Intel Vtune Performance Analyzer Features not yet (!!) available in current Version Performance Analysis for Java Support for Java will be added later Exact, graphical call graph Needed? 28
29 Intel Performance Tunining Utility (Intel PTU) Features not available in new Tool yet (!) Data Access Profiling Based on PEBS Precise Event Based Sampling Very likely added to future update Control-flow graph / basic block view Heap Memory ( Space-Time ) Profiling Very likely added to future update Call Count Analysis Similar to Precise Call Graph of VTunebut using PIN for instrumentation PTU available as a free, unsupported tool 29
30 More Wish-List Features Event Abstraction Architecture-independent tuning using events Hide specific processor-dependent details Improved support for parallel tasking models OpenMP* 3.0 tasking Intel Threading Building Blocks Intel Cilk Plus EBS for VMM/Virtual operating system environments 30
31 Intel PTU Basic block grouping and execution flow analysis
32 PTU Feature: Heap Profiling Spacetime Objects Number Call Stack Block Size Hot Path Allocation Graph
33 Summary Intel(R) VTune(TM) Amplifier XE is the next generation performance tool The new performance analysis tool is successor of popular Intel VTune + Thread Profiler Additional features and technologies New binary instrumentation engine Standalone GUI on Linux and Windows* Let us know about issues you see of course but also please submit feature requests Existing Intel Vtune Performance Analyzer license with non-expired support access will work for new tool 33
Intel VTune Amplifier XE
Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance
More informationPerformance Analysis using Intel VTune Amplifier XE
Performance Analysis using Intel VTune Amplifier XE Performance methodology profiling and tuning The Goal: minimize the time it takes your program / module / function to execute Identify Hotspots and focus
More informationRevealing the performance aspects in your code
Revealing the performance aspects in your code 1 Three corner stones of HPC The parallelism can be exploited at three levels: message passing, fork/join, SIMD Hyperthreading is not quite threading A popular
More informationUsing Intel VTune Amplifier XE and Inspector XE in.net environment
Using Intel VTune Amplifier XE and Inspector XE in.net environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1 Refresher - Intel VTune Amplifier XE Intel Inspector
More informationMicroarchitectural Analysis with Intel VTune Amplifier XE
Microarchitectural Analysis with Intel VTune Amplifier XE Michael Klemm Software & Services Group Developer Relations Division 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationCERN IT Technical Forum
Evaluating program correctness and performance with new software tools from Intel Andrzej Nowak, CERN openlab March 18 th 2011 CERN IT Technical Forum > An introduction to the new generation of software
More informationIntel Parallel Amplifier 2011
THREADING AND PERFORMANCE PROFILER Intel Parallel Amplifier 2011 Product Brief Intel Parallel Amplifier 2011 Optimize Performance and Scalability Intel Parallel Amplifier 2011 makes it simple to quickly
More informationUsing Intel VTune Amplifier XE for High Performance Computing
Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message
More informationIntel Parallel Amplifier
Intel Parallel Amplifier Product Brief Intel Parallel Amplifier Optimize Performance and Scalability Intel Parallel Amplifier makes it simple to quickly find multicore performance bottlenecks without needing
More informationProfiling: Understand Your Application
Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel
More informationUsing Intel Inspector XE 2011 with Fortran Applications
Using Intel Inspector XE 2011 with Fortran Applications Jackson Marusarz Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationPerformance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino
Performance analysis tools: Intel VTuneTM Amplifier and Advisor Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimisation After having considered the MPI layer,
More informationOptimize Data Structures and Memory Access Patterns to Improve Data Locality
Optimize Data Structures and Memory Access Patterns to Improve Data Locality Abstract Cache is one of the most important resources
More informationIntel Parallel Studio 2011
THE ULTIMATE ALL-IN-ONE PERFORMANCE TOOLKIT Studio 2011 Product Brief Studio 2011 Accelerate Development of Reliable, High-Performance Serial and Threaded Applications for Multicore Studio 2011 is a comprehensive
More informationPerformance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava,
Performance Profiler Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, 08-09-2016 Faster, Scalable Code, Faster Intel VTune Amplifier Performance Profiler Get Faster Code Faster With Accurate
More informationSimplified and Effective Serial and Parallel Performance Optimization
HPC Code Modernization Workshop at LRZ Simplified and Effective Serial and Parallel Performance Optimization Performance tuning Using Intel VTune Performance Profiler Performance Tuning Methodology Goal:
More informationTools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ,
Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - fabio.baruffa@lrz.de LRZ, 27.6.- 29.6.2016 Architecture Overview Intel Xeon Processor Intel Xeon Phi Coprocessor, 1st generation Intel Xeon
More informationIntel VTune Amplifier XE. Dr. Michael Klemm Software and Services Group Developer Relations Division
Intel VTune Amplifier XE Dr. Michael Klemm Software and Services Group Developer Relations Division Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS
More informationTutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE
Tutorial: Analyzing MPI Applications Intel Trace Analyzer and Collector Intel VTune Amplifier XE Contents Legal Information... 3 1. Overview... 4 1.1. Prerequisites... 5 1.1.1. Required Software... 5 1.1.2.
More informationPerformance Tuning VTune Performance Analyzer
Performance Tuning VTune Performance Analyzer Paul Petersen, Intel Sept 9, 2005 Copyright 2005 Intel Corporation Performance Tuning Overview Methodology Benchmarking Timing VTune Counter Monitor Call Graph
More informationOverview of Intel Parallel Studio XE
Overview of Intel Parallel Studio XE Stephen Blair-Chappell 1 30-second pitch Intel Parallel Studio XE 2011 Advanced Application Performance What Is It? Suite of tools to develop high performing, robust
More informationIntel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel
Intel VTune Amplifier XE for Tuning of HPC Applications Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Agenda Which performance analysis tool should I use first? Intel Application
More informationIntel VTune Performance Analyzer 9.1 for Windows* In-Depth
Intel VTune Performance Analyzer 9.1 for Windows* In-Depth Contents Deliver Faster Code...................................... 3 Optimize Multicore Performance...3 Highlights...............................................
More informationJackson Marusarz Intel Corporation
Jackson Marusarz Intel Corporation Intel VTune Amplifier Quick Introduction Get the Data You Need Hotspot (Statistical call tree), Call counts (Statistical) Thread Profiling Concurrency and Lock & Waits
More informationIntel profiling tools and roofline model. Dr. Luigi Iapichino
Intel profiling tools and roofline model Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimization (and to the next hour) We will focus on tools developed
More informationTutorial: Finding Hotspots with Intel VTune Amplifier - Linux* Intel VTune Amplifier Legal Information
Tutorial: Finding Hotspots with Intel VTune Amplifier - Linux* Intel VTune Amplifier Legal Information Tutorial: Finding Hotspots with Intel VTune Amplifier - Linux* Contents Legal Information... 3 Chapter
More informationOracle Developer Studio Performance Analyzer
Oracle Developer Studio Performance Analyzer The Oracle Developer Studio Performance Analyzer provides unparalleled insight into the behavior of your application, allowing you to identify bottlenecks and
More informationEliminate Threading Errors to Improve Program Stability
Introduction This guide will illustrate how the thread checking capabilities in Intel Parallel Studio XE can be used to find crucial threading defects early in the development cycle. It provides detailed
More informationEliminate Threading Errors to Improve Program Stability
Eliminate Threading Errors to Improve Program Stability This guide will illustrate how the thread checking capabilities in Parallel Studio can be used to find crucial threading defects early in the development
More informationWhat's new in VTune Amplifier XE
What's new in VTune Amplifier XE Naftaly Shalev Software and Services Group Developer Products Division 1 Agenda What s New? Using VTune Amplifier XE 2013 on Xeon Phi coprocessors New and Experimental
More informationIntel Xeon Phi Coprocessor Performance Analysis
Intel Xeon Phi Coprocessor Performance Analysis Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
More informationKNL tools. Dr. Fabio Baruffa
KNL tools Dr. Fabio Baruffa fabio.baruffa@lrz.de 2 Which tool do I use? A roadmap to optimization We will focus on tools developed by Intel, available to users of the LRZ systems. Again, we will skip the
More informationKlaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation
S c i c o m P 2 0 1 3 T u t o r i a l Intel Xeon Phi Product Family Programming Tools Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation Agenda Intel Parallel Studio XE 2013
More informationMemory & Thread Debugger
Memory & Thread Debugger Here is What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action Intel Confidential 2 Analysis Tools for Diagnosis
More informationEliminate Memory Errors to Improve Program Stability
Introduction INTEL PARALLEL STUDIO XE EVALUATION GUIDE This guide will illustrate how Intel Parallel Studio XE memory checking capabilities can find crucial memory defects early in the development cycle.
More information2
1 2 3 4 5 6 For more information, see http://www.intel.com/content/www/us/en/processors/core/core-processorfamily.html 7 8 The logic for identifying issues on Intel Microarchitecture Codename Ivy Bridge
More informationIntroduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero
Introduction to Intel Xeon Phi programming techniques Fabio Affinito Vittorio Ruggiero Outline High level overview of the Intel Xeon Phi hardware and software stack Intel Xeon Phi programming paradigms:
More informationVisual Profiler. User Guide
Visual Profiler User Guide Version 3.0 Document No. 06-RM-1136 Revision: 4.B February 2008 Visual Profiler User Guide Table of contents Table of contents 1 Introduction................................................
More informationIntel Parallel Studio XE 2015
2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:
More informationGraphics Performance Analyzer for Android
Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent
More informationGet an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*
Get an Easy Performance Boost Even with Unthreaded Apps for Windows* Can recompiling just one file make a difference? Yes, in many cases it can! Often, you can achieve a major performance boost by recompiling
More informationIntel PerfMon Performance Monitoring Hardware
Intel PerfMon Performance Monitoring Hardware Overview PerfMon Basics PerfMon is hardware throughout the silicon available through registers to tools to facilitate several system/application usages: compiler
More informationIntel System Studio 2014 Overview
Intel System Studio 2014 Overview What you will learn from this slide deck High level overview of each component for Intel System Studio, along with how they address these development environments System
More informationBei Wang, Dmitry Prohorov and Carlos Rosales
Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512
More informationPerformance Tools for Technical Computing
Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology
More informationHow to write powerful parallel Applications
How to write powerful parallel Applications 08:30-09.00 09.00-09:45 09.45-10:15 10:15-10:30 10:30-11:30 11:30-12:30 12:30-13:30 13:30-14:30 14:30-15:15 15:15-15:30 15:30-16:00 16:00-16:45 16:45-17:15 Welcome
More informationJackson Marusarz Software Technical Consulting Engineer
Jackson Marusarz Software Technical Consulting Engineer What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action 2 Analysis Tools for Diagnosis
More informationIntel Manycore Testing Lab (MTL) - Linux Getting Started Guide
Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation
More informationGetting Started Tutorial: Finding Hotspots
Getting Started Tutorial: Finding Hotspots Intel VTune Amplifier XE 2013 for Linux* OS Fortran Sample Application Code Document Number: 327359-001 Legal Information Contents Contents Legal Information...5
More informationNative Computing and Optimization on Intel Xeon Phi
Native Computing and Optimization on Intel Xeon Phi ISC 2015 Carlos Rosales carlos@tacc.utexas.edu Overview Why run native? What is a native application? Building a native application Running a native
More informationCPU Cacheline False Sharing - What is it? - How it can impact performance. - How to find it? (new tool)
CPU Cacheline False Sharing - What is it? - How it can impact performance. - How to find it? (new tool) Oct 26, 2016 Senior Principal Engineer Red Hat Performance Engineering Red Hat Performance Engineering
More informationEliminate Memory Errors to Improve Program Stability
Eliminate Memory Errors to Improve Program Stability This guide will illustrate how Parallel Studio memory checking capabilities can find crucial memory defects early in the development cycle. It provides
More informationClearSpeed Visual Profiler
ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are
More informationThis guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.
Introduction A resource leak refers to a type of resource consumption in which the program cannot release resources it has acquired. Typically the result of a bug, common resource issues, such as memory
More informationAgenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP
More informationGetting Started Tutorial: Analyzing Threading Errors
Getting Started Tutorial: Analyzing Threading Errors Intel Inspector XE 2011 for Linux* OS Fortran Sample Application Code Document Number: 326600-001 World Wide Web: http://developer.intel.com Legal Information
More informationEfficiently Introduce Threading using Intel TBB
Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++
More informationGetting Started Tutorial: Identifying Hardware Issues
Getting Started Tutorial: Identifying Hardware Issues Intel VTune Amplifier XE 2011 for Linux* OS C++ Sample Application Code Document Number: 326709-001 Legal Information Contents Contents Legal Information...5
More informationLecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform
More informationHPC Tools on Windows. Christian Terboven Center for Computing and Communication RWTH Aachen University.
- Excerpt - Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University PPCES March 25th, RWTH Aachen University Agenda o Intel Trace Analyzer and Collector
More informationUsing the Intel VTune Amplifier 2013 on Embedded Platforms
Using the Intel VTune Amplifier 2013 on Embedded Platforms Introduction This guide explains the usage of the Intel VTune Amplifier for performance and power analysis on embedded devices. Overview VTune
More informationToday. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture
More informationGetting Started Tutorial: Finding Hotspots
Getting Started Tutorial: Finding Hotspots Intel VTune Amplifier XE 2013 for Windows* OS Fortran Sample Application Code Document Number: 327358-001 Legal Information Contents Contents Legal Information...5
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More informationOverview. Technology Details. D/AVE NX Preliminary Product Brief
Overview D/AVE NX is the latest and most powerful addition to the D/AVE family of rendering cores. It is the first IP to bring full OpenGL ES 2.0/3.1 rendering to the FPGA and SoC world. Targeted for graphics
More informationIntel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes
Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 Intel Debugger (IDB) is
More informationIntroduction to Performance Tuning & Optimization Tools
Introduction to Performance Tuning & Optimization Tools a[i] a[i+1] + a[i+2] a[i+3] b[i] b[i+1] b[i+2] b[i+3] = a[i]+b[i] a[i+1]+b[i+1] a[i+2]+b[i+2] a[i+3]+b[i+3] Ian A. Cosden, Ph.D. Manager, HPC Software
More informationRecent Performance Analysis with Memphis. Collin McCurdy Future Technologies Group
Recent Performance Analysis with Memphis Collin McCurdy Future Technologies Group Motivation Current projections call for each chip in an Exascale system to contain 100s to 1000s of processing cores Already
More informationIntel Parallel Studio: Vtune
ntel Parallel Studio: Vtune C.Berthelot Christophe.Berthelot@atos.net Copyright c Bull S.A.S. 2016 1 C.Berthelot Christophe.Berthelot@atos.net c Atos Agenda ntroduction Bottelneck Gprof ntroduction The
More informationGetting Started Tutorial: Analyzing Memory Errors
Getting Started Tutorial: Analyzing Memory Errors Intel Inspector XE 2011 for Linux* OS Fortran Sample Application Code Document Number: 326596-001 World Wide Web: http://developer.intel.com Legal Information
More informationIntel Parallel Amplifier Sample Code Guide
The analyzes the performance of your application and provides information on the performance bottlenecks in your code. It enables you to focus your tuning efforts on the most critical sections of your
More informationGetting Started Tutorial: Finding Hotspots
Getting Started Tutorial: Finding Hotspots Intel VTune Amplifier XE 2013 for Linux* OS C++ Sample Application Code Document Number: 326705-002 Legal Information Contents Contents Legal Information...5
More informationPerformance Analysis of Parallel Scientific Applications In Eclipse
Performance Analysis of Parallel Scientific Applications In Eclipse EclipseCon 2015 Wyatt Spear, University of Oregon wspear@cs.uoregon.edu Supercomputing Big systems solving big problems Performance gains
More informationCS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines
CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per
More informationIntel Thread Checker 3.1 for Windows* Release Notes
Page 1 of 6 Intel Thread Checker 3.1 for Windows* Release Notes Contents Overview Product Contents What's New System Requirements Known Issues and Limitations Technical Support Related Products Overview
More informationIntel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes
Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...
More informationOracle Developer Studio 12.6
Oracle Developer Studio 12.6 Oracle Developer Studio is the #1 development environment for building C, C++, Fortran and Java applications for Oracle Solaris and Linux operating systems running on premises
More informationGetting Started Tutorial: Finding Hotspots
Getting Started Tutorial: Finding Hotspots Intel VTune Amplifier XE 2013 for Windows* OS C++ Sample Application Code Document Number: 326704-002 Legal Information Contents Contents Legal Information...5
More informationIntel Parallel Studio
Intel Parallel Studio Product Brief Intel Parallel Studio Parallelism for your Development Lifecycle Intel Parallel Studio brings comprehensive parallelism to C/C++ Microsoft Visual Studio* application
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationParallel Programming on Larrabee. Tim Foley Intel Corp
Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationPerformance Profiling
Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance
More informationIntel Threading Tools
Intel Threading Tools Paul Petersen, Intel -1- INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS,
More informationDynamic Binary Instrumentation: Introduction to Pin
Dynamic Binary Instrumentation: Introduction to Pin Instrumentation A technique that injects instrumentation code into a binary to collect run-time information 2 Instrumentation A technique that injects
More informationIntel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector
Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector A brief Introduction to MPI 2 What is MPI? Message Passing Interface Explicit parallel model All parallelism is explicit:
More informationKampala August, Agner Fog
Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationXT Node Architecture
XT Node Architecture Let s Review: Dual Core v. Quad Core Core Dual Core 2.6Ghz clock frequency SSE SIMD FPU (2flops/cycle = 5.2GF peak) Cache Hierarchy L1 Dcache/Icache: 64k/core L2 D/I cache: 1M/core
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationBorland Optimizeit Enterprise Suite 6
Borland Optimizeit Enterprise Suite 6 Feature Matrix The table below shows which Optimizeit product components are available in Borland Optimizeit Enterprise Suite and which are available in Borland Optimizeit
More informationHPC Lab. Session 4: Profiler. Sebastian Rettenberger, Chaulio Ferreira, Michael Bader. November 9, 2015
HPC Lab Session 4: Profiler Sebastian Rettenberger, Chaulio Ferreira, Michael Bader November 9, 2015 Session 4: Profiler, November 9, 2015 1 Profiler Profiling allows you to learn where your program spent
More informationEfficient and Large Scale Program Flow Tracing in Linux. Alexander Shishkin, Intel
Efficient and Large Scale Program Flow Tracing in Linux Alexander Shishkin, Intel 16.09.2013 Overview Program flow tracing - What is it? - What is it good for? Intel Processor Trace - Features / capabilities
More informationMore performance options
More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction
More informationMessage Passing Interface (MPI) on Intel Xeon Phi coprocessor
Message Passing Interface (MPI) on Intel Xeon Phi coprocessor Special considerations for MPI on Intel Xeon Phi and using the Intel Trace Analyzer and Collector Gergana Slavova gergana.s.slavova@intel.com
More informationGetting Started with Intel SDK for OpenCL Applications
Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel
More informationReusing this material
XEON PHI BASICS Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationIntel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python
Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Python Landscape Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#1:
More informationIntel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant
Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor
More information