Locate a Hotspot and Optimize It

Similar documents
Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*

Intel Parallel Amplifier

Intel Parallel Amplifier 2011

Optimize an Existing Program by Introducing Parallelism

Eliminate Memory Errors and Improve Program Stability

Getting Started Tutorial: Finding Hotspots

Intel Parallel Amplifier Sample Code Guide

Tutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE

Intel Parallel Studio 2011

Efficiently Introduce Threading using Intel TBB

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Intel VTune Performance Analyzer 9.1 for Windows* In-Depth

Installation Guide and Release Notes

Eliminate Threading Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability

CodeWarrior Development Studio for etpu v10.x Quick Start SYSTEM REQUIREMENTS

Getting Started Tutorial: Finding Hotspots

Installation Guide and Release Notes

Eliminate Memory Errors to Improve Program Stability

Installation Guide and Release Notes

Getting Started Tutorial: Finding Hotspots

Intel VTune Amplifier XE

Tutorial: Finding Hotspots with Intel VTune Amplifier - Linux* Intel VTune Amplifier Legal Information

Getting Started Tutorial: Finding Hotspots

Tutorial: Finding Hotspots on an Android* Platform

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

A Simple Path to Parallelism with Intel Cilk Plus

How To Find Out Your Laptop's Graphics Card Windows 7

Installation Guide and Release Notes

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.

CodeWarrior Development Studio for Power Architecture Processors Version 10.x Quick Start

Performance Tools for Technical Computing

LENOVO BUSINESS SOLUTIONS AND INTEL OPTANE MEMORY

Getting Started Tutorial: Analyzing Memory Errors

Eliminate Memory Errors to Improve Program Stability

Warewolf User Guide 1: Introduction and Basic Concepts

for StarCore DSP Architectures Quick Start for the Windows Edition

Opening Microsoft Visual Studio. On Microsoft Windows Vista and XP to open the visual studio do the following:

Intel Parallel Studio XE 2015

Installation Guidelines Ujjwala KYC Offline Application. By:

Intel Xeon Phi Coprocessor Performance Analysis

Getting Started Tutorial: Analyzing Memory Errors

Intel Parallel Studio

QUICKSTART CODE COMPOSER STUDIO Stellaris Development and Evaluation Kits for Code Composer Studio

PC-3000 EXPRESS / UDMA / PORTABLE

Getting Started Tutorial: Analyzing Threading Errors

Guide Citrix administrator guide

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance

A computer is an electronic device, operating under the control of instructions stored in its own memory unit.

Visual Studio 2008 Load Symbols Manually

VTune(TM) Performance Analyzer for Linux

Microarchitectural Analysis with Intel VTune Amplifier XE

SQL Server. Management Studio. Chapter 3. In This Chapter. Management Studio. c Introduction to SQL Server

Readme: Autodesk Impression

Citrix administator guide

for ColdFire Architectures V7.2 Quick Start

VERSION GROUPWISE WEBACCESS USER'S GUIDE

Getting Compiler Advice from the Optimization Reports

Using Intel Inspector XE 2011 with Fortran Applications

Controller Continuum. for Microcontrollers V6.3. Quick Start

Installing and Setting Up the Snap-on EPC. Rev.1.6 (12 Apr 2012) PN EN

Intel Thread Checker 3.1 for Windows* Release Notes

Supplement: Visual C++ Debugging

Getting Started with VMware Fusion VMware Fusion for Mac OS X Version 1.0

TriCore Free Entry Tool Chain. AURIX family and AUDO Future, AUDO MAX

Ecava IntegraXor SCADA Getting Started Guide

Installation Guide and Release Notes

Asynchronous Method Calls White Paper VERSION Copyright 2014 Jade Software Corporation Limited. All rights reserved.

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Lesson 2: Using the Performance Console

WELCOME! LIVE with ROBERT GREEN:

1 GB RAM USB port 1152x864 minimum screen resolution (1280x720 recommended) 15 GB of hard drive space for full install

CodeWarrior Development Studio for StarCore DSP SC3900FP Architectures Quick Start for the Windows Edition

Intel Parallel Studio

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Rack2-Filer ( 1) (Exclusive to S1300 with Rack2-Filer)

Intel Parallel Inspector 2011 Getting Started Tutorials

Hardware and Software minimum specifications

Free SolidWorks from Performance Constraints

Manually Make Computer Faster For Gaming Windows 7 Home Basic

Thread Profiler 2.0 Release Notes

IDIS Solution Suite. Streaming Service. Software Manual. Powered by

BASICS OF THE RENESAS SYNERGY PLATFORM

OVERCLOCK INTEL E6400 USER MANUAL DOCUMENT

Three OPTIMIZING. Your System for Photoshop. Tuning for Performance

WHITE PAPER Migrating to Windows 10: a practical guide

Ambush Client Software User Guide For use with the full Range of Ambush DVRs Version 1.2

USB 2.0 Video/Audio Grabber User s Guide

Performance Profiling. Curtin University of Technology Department of Computing

How to use Genetec Omnicast with Hikvision devices V1.0.0 ( )

Graphics Performance Analyzer for Android

Using Intel VTune Amplifier XE for High Performance Computing

Getting Started Tutorial: Analyzing Memory Errors

Using the KD30 Debugger

IMC Intelligent Analysis Report v7.1 (E0301P02) Copyright (c) 2015 Hewlett-Packard Development Company, L.P. and its licensors.

Problem Error Code 43 Windows 7 Nvidia Geforce 8400m Gs

dbdos PRO 2 Quick Start Guide dbase, LLC 2013 All rights reserved.

BASIC NAVIGATION & VIEWS...

M16C/62P QSK QSK62P Plus Tutorial 1. Software Development Process using HEW4

Transcription:

Locate a Hotspot and Optimize It 1

Can Recompiling Just One File Make a Difference? Yes, in many cases it can! Often, you can get a major performance boost by recompiling a single file with the optimizing compiler in Intel Parallel Studio. You do not always need to recompile the entire app, a truism that applies to both serial and parallel applications. Figure 1 Two Easy Steps to Better Performance Step 1. Find the hotspot(s): Measure where the application is spending time In order to tune effectively, you must optimize the parts of the applications that demand a lot of time. Tune something that is already fast, and you will see very little benefit. A hotspot is a place where the app is spending a lot of time. We want to find those areas and speed them up. This is easily done using a profiling tool like Intel Parallel Amplifier. So, do not waste your time optimizing things that do not need it find your hotspots. OK, you have found the hotspot, now what? In some cases, it may be obvious how to make the program run faster. For example, you may find you are repeating an operation that you only need to do once. Unfortunately, in most cases the answer is less obvious. People often ask, Can t you suggest something or do it automatically? In many cases, we can. Step 2. Optimize it: Recompile just the hotspot (even just one file) The optimizing compiler in Intel Parallel Composer can often improve performance just by recompiling the file(s) in which the hotspot(s) are located. On smaller applications, you can just recompile everything and see what you get. On large applications with many modules and projects, this may be impractical. Fortunately, there is rarely a need to recompile the entire application. Recompiling one or two files may be all that is necessary, or perhaps just a single project. And, since the Intel Compiler is binary and debug compatible with the Microsoft* compiler, you can seamlessly mix and match objects built with either tool. Ticker Tape Visual effects demo Smoke 1.0 Game PiSolver Calculate Pi BEFORE 66 frames per second BEFORE 64 frames per second BEFORE 2.76 seconds AFTER** 84 frames per second 27% SPEEDUP AFTER** 75 frames per second 17% SPEEDUP AFTER** 1.46 seconds 89% SPEEDUP Application details Watch the movie Watch the movie Application details Microsoft* Visual Studio Intel Parallel Composer Figure 1 Microsoft Visual Studio 2008* used for Ticker Tape and PiSolver; Microsoft Visual Studio 2005* for Smoke **Intel Parallel Composer, update 5 System Specifications: Ticker Tape and Smoke: Intel Core i7 processor (4 cores), 3.20 GHz, 3.0 GB RAM, NVIDIA GeForce 9800 GX2; Windows* Vista Ultimate SP2; PiSolver: Intel Core 2 Duo 1.2 GHz (Centrino Pro Laptop), 2 GB RAM, Microsoft* Windows XP SP 3 2

Try It Yourself Here is a simple example using the Intel C++ Compiler included in Intel Parallel Studio. You can read it here or try it yourself using the steps below and the PiSolver sample code. Step 1. Install Intel Parallel Studio 1. Download an evaluation copy of Intel Parallel Studio. 2. Install Intel Parallel Studio by clicking on the parallel_studio_setup.exe. Step 2. Install and View the PiSolver Sample Application 1. Download the PiSolver Sample.zip sample file to your local machine. 2. Extract the files from the PiSolver.zip file to a writable directory or share on your system. 3. Open the sample in Microsoft Visual Studio. Go to File > Open > Project Solution: Figure 2 4. Set the PiSolver sample application to Release mode. Select Build > Configuration Manager and then, under the Figure 2 3

Active Solution Configuration drop-down box, select the Release setting and close the Configuration Manager. Figure 3 5. Build the application. Go to Build > Build Solution. 6. Run the application from Microsoft Visual Studio. Go to Debug > Start Without Debugging. Figure 4 7. Click the Calculate button to compute the pi value and see the time it took in milliseconds. Figure 5 Figure 3 Figure 4 Figure 5 4

Step 3. Run Intel Parallel Amplifier Find the Hotspots Estimated completion time: 10 minutes Intel Parallel Amplifier enables you to quickly find the best place to focus your performance tuning. Configure the Project Settings Set the configuration to Release (optimized) with debug symbols. This setting enables the Intel Parallel Amplifier to provide the most useful information about the application. 1. In the Microsoft Visual Studio Solution Explorer window, rightclick the Pi project and select Properties. 2. Expand the Configuration Properties, if it is not already expanded, and click the plus (+) sign next to Configuration Properties. 3. Set the general debug information property. 4. Expand C/C++ and click General. 5. Under the Debug Information Format, select Program Database (/Zi) and click Apply. Figure 6 6. Set the linker debug information property. 7. Expand Linker and click Debugging. 8. Select Generate Debug Info > Yes (/DEBUG) and click Apply. Figure 7 Figure 6 Figure 7 5

Run Intel Parallel Studio Amplifier 1. In the Intel Parallel Amplifier toolbar, click the drop-down button and choose Hotspots Where is my program spending time? Figure 8 2. Click the Profile button to run Intel Parallel Amplifier. Intel Parallel Amplifier launches the Pi Sample application. 3. Click Calculate to start the application and get results. Record the Output results. This is your baseline measurement. 4. Click Close to close the application and start analysis by Intel Parallel Amplifier. You may see the Hotspot Analysis explanation text box covering the report; read and close this first. 5. Click the plus (+) sign in front of CalcPi in the Caller Function Tree to expand the call tree for that module. 6. Double-click the hotspot for CalcPi (pigetsolutions) to find the source file involved with the hotspot. For some applications, it may be easier to see the call tree by using the Top-Down Tree view. Also, for larger applications, you will likely have larger function trees to expand to find the hottest functions. In the PiSolver example, your hotspot is located in the file pi.cpp. Intel Parallel Amplifier results show a hotspot in the Hotspots and the Call Stack panes : CalcPi(int) - pi.cpp. Figure 9 Figure 8 6

Figure 9 2 1 3 4 5 6 1 2 3 4 5 6 Results from hotspots analysis. Number (000) increments for each result collected. Function Bottom-up Tree is the default grouping level for hotspot data. Click the arrow button to change the grouping level. Click the plus (+) sign in front of the function name to view call stacks for the selected function. Callers of the selected function are displayed, and then callers of the first caller(s), and so on. CPU time is the active time taken to execute a function on a logical processor. For multiple threads, CPU time is summed up. This is the Data of Interest column for the hotspots analysis results. Full stack information for the function selected in the grid. The yellow bar shows the contribution of the selected stack to the hotspot function CPU time. Summary data on the analysis run: 1) Elapsed Time is the execution time of the application from start to termination; 2) CPU Time is the sum of CPU time for all threads; 3) Unused CPU Time is the total time for each core when it was either waiting or not utilized by the application; 4) Core Count is the logical CPU count for your machine; 5) Threads Created is the number of threads created by your system during the application run. 7

Figure 10 Note: You can check the box entitled Do not clean project(s) for large projects that may take a long time to compile. However, this is not necessary for the PiSolver example. Figure 11 Figure 12 Step 4. Compile with the Intel C++ Compiler in Intel Parallel Composer Estimated completion time: 10 minutes Compile the file with the hotspot using Intel C++ Compiler, which is provided by Intel Parallel Composer. 1. In the Solution Explorer, Sources folder, select pi.cpp and click the Use Intel C++ button. Figure 10 2. In the Confirmation dialog box, click OK. You created a new project configuration under the Pi project. This configuration uses the Intel C++ Compiler instead of the default Microsoft Visual Studio Compiler. Figure 11 8

Figure 13 Now, you will change the settings to use the Intel C++ Compiler only for selected files and the Microsoft Visual Studio default compiler for the rest. 1. Change the project configuration to use the Microsoft C++ Compiler by selecting Project > Properties and then, under the Configuration Properties > General view, change the Compiler and Environment Settings to use the Microsoft Visual C++ Compiler (cl.exe). Figure 12 2. Click Continue in the Confirmation box. Then, click Apply and OK. Success This example demonstrates how simple it is to identify and optimize a hotspot in your application and improve performance. Use Intel Parallel Amplifier together with Intel Parallel Composer to easily achieve performance improvements. Speedup for this example, using an Intel Core 2 Duo processor: > > Calculation time before optimization: 1453ms > > Calculation time after optimization: 875ms The project is now configured as an Intel Parallel Composer project that uses the Microsoft C++ Compiler. 3. Right-click the pi.cpp file and select Properties, and then under the Configuration Properties > General view, change the Compiler and Environment Settings to use the Intel C++ Compiler (icl.exe). Figure 13 4. Click OK and Apply. 5. Build the project: Click the Pi project in the Solution Explorer Pane and go to Build > Build Pi. You will see in the Output pane that pi.cpp is compiled with the Intel compiler, while the others are built with the Microsoft Visual C++ Compiler. 6. Run the PiSolver application again with Debug > Start without Debugging and click Calculate in the application box. You should see a significant speedup in seconds compared to what you experienced compiling pi.cpp with the Microsoft Compiler. 9

Key Terms and Concept The Path to Parallelism Key Terms CPU time: The CPU time is the amount of time a thread spends executing on a logical processor. For multiple threads, the CPU time of the threads is summed. The application CPU time is the sum of the CPU time of all the threads that run the application. Target: A target is an executable file that you analyze using Intel Parallel Amplifier. Key Concept Hotspot analysis: Hotspot analysis helps you understand the application flow and identify sections of code that take a long time to execute (i.e., hotspots). This is where you want to focus your tuning efforts as it will have the biggest impact on overall application performance. Intel Parallel Amplifier creates a list of functions in your application ordered by the amount of time spent in a function. It also detects the call stacks for each of these functions so you can see how the hot functions are called. It uses a low-overhead (about 5 percent), statistical-sampling algorithm that gets you the information you need without a significant slowing of application execution. Summary Speeding up your application may be as easy as recompiling a single file using the Intel C++ Compiler. The trick is picking the source file that contains the performance hotspot. Intel Parallel Amplifier finds the hotspot so you can focus your optimization efforts where they will be most effective. We are here to help developers write correct, high-performing code that will take advantage of both today s and tomorrow s processing power. Learn more from Intel experts about parallelism, Intel Parallel Studio, and other related subjects. Related links Intel Software Network Forums Intel Software Products Knowledge Base Intel Software Network Blogs Intel Parallel Studio Website Intel Threading Building Blocks Website Go Parallel Parallelism Blogs, Papers, and Videos Free, On-Demand Software Developer Webinars Check out additional evaluation guides: Optimize an Existing Program by Introducing Parallelism Eliminate Memory Errors and Improve Program Stability 2010, Intel Corporation. All rights reserved. Intel, Centrino, Core, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. 0510/BLA/CMD/PDF 323865-001US 10