Prof. Thomas Sterling

Size: px
Start display at page:

Download "Prof. Thomas Sterling"

Transcription

1 High Performance Computing: Concepts, Methods & Means Performance Measurement 1 Prof. Thomas Sterling Department of Computer Science Louisiana i State t University it February 13 th, 2007

2 News Alert! Intel announces Teraflops Research Chip 80 cores 1.8 Teraflops 5.6 GHz) 60 watts 1.01 Tflops) At ISSCC Mesh network on a chip

3 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 3

4 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 4

5 Opening Remarks Up until now, 2 strategies for measuring performance: 1) wall-clock time for user applications 2) benchmarks for comparing Machines of different type Machines of different scale But, we have identified factors that contribute to system operational performance, e.g.: Effective use of parallelism Cache behavior To make better use of HPC systems, need to measure operational behavior How the system is performing during application execution What are the application demands and bottlenecks Focus on SMP class system operation during this Segment Next Segment: measuring MPP & cluster behavior

6 What you ll Need to Know This is a skills-oriented lecture Understand the kinds and levels of metrics of system and processor operation that you can measure Know the kinds of tools that can expose valuable parameters of system & application operation Hardware counters Software instrumentation, data acquisition, and presentation Learn the basics of how to use specific tools when running your application code Gproff Perfsuite PAPI TAU

7 Final initial comments (yes, I know that s an oxymoron) We are only going to scratch the surface today Try to get the basic ideas This will expose you to a range of concepts, strategies, and tools Lots of details will be left to future discussions Over the next weeks, we will extend our abilities in using these tools But don t hesitate to read through the documentation Hey, try some things out for yourself You ve got a sandbox to play in (Celeritas)

8 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 8

9 Hardware Counters MP MP MP MP L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 M1 M1 M n-1 Each processor has the ability to monitor events of various kinds Small set of registers used to count events. Very processor specific. (some nice graphic here Schematic of CPU, ME) NIC Controller NIC S S PCI-e JTAG Ethernet Peripherals USB

10 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

11 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

12 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

13 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

14 Hardware Events Floating point operations, Multiplies, Adds, Multiply-Adds, etc. L1/L2 cache hits/misses (see Translation Lookaside Buffer hits/misses i (virtual to physical address translation table) Branch prediction counters (pipelined systems must guess the next instruction to fetch)

15 A Goal: Optimization Compile Time: Various levels enabled by compiler options Examine Compiler Output Run Time (Performance Analysis): Instrument code or execution to produce a trace Tools to analyze trace: Standard/basic tool is gprof, but there are many others Note: Java Hot-Spot environment collects data about execution and uses it to optimize a program as it runs

16 Performance Analysis Tools Widely Ported Low-Level Interface to hardware counters: PAPI (Performance API) Many tools built on PAPI Perfsuite (NCSA), psrun command TAU (University of Oregon) etc. etc. Useful for: Finding performance bottlenecks Identifying cache problems (badly sized arrays)

17 time A simple Unix command to give resource usage. Runs a specified program time [options] command [arguments ] Gives timing statistics about program run The elapsed real time between invocation and termination User CPU time S t CPU ti System CPU time See: man time

18 top Gives an overview of system process status and resource usage Provides a dynamic realtime view of a running system System summary information Currently managed tasks Updates every few (e.g. 5) seconds top hv -bciss d delay n iterations p pid [, pid ] See: man top

19 Basic Tools Time $ time du -s /usr > /dev/null 2>&1 top/ps real 0m34.274s user 0m0.082s sys 0m0.957s top - 11:29:40 up 49 min, 2 users, load average: 0.32, 0.26, 0.25 Tasks: 125 total, 3 running, 121 sleeping, 0 stopped, 1 zombie Cpu(s): 4.5%us, 0.3%sy, 0.0%ni, 94.7%id, 0.2%wa, 0.3%hi, 0.0%si, 0.0%st Mem: k total, k used, 17564k free, k buffers Swap: k total, 32k used, k free, k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4136 sbrandt m 10m S :03.35 gnome-terminal 3761 root m 12m R :02.82 X 5195 sbrandt R :00.03 top 3487 root S :00.25 hald-addon-stor 3930 sbrandt m 40m 14m S :36.27 beagled

20 Topics Introduction Measuring System Operation gprof Perfsuite PAPI Tau Summary Material for the Test 20

21 gprof : quick overview gprof a utility which profiles procedures in programs, available in most Unix systems. gprof provides information about : An index for each procedure Parent of each procedures The percentage of CPU time utilized by a procedure and its calls. Breakdown of time used by the procedure and its descendents Number of times a procedure was called. direct descendents of each procedure To use gprof : compile the source code with a pg option running the executable created generates an output file gmon.out for serial programs, and gmon.out.0, gmon.out.1 etc. For serial programs : gprof exe gmon.out For parallel programs : gprof exe gmon.out.*

22 GPROF: one minute tutorial Steps to use gprof: gcc -pg -g -o prog prog.c./prog gprof prog gmon.out More reading: Finds subroutines where the most time is spent Cannot tell you why some routines are more costly than others. Need more information...

23 Demo of gprof

24 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 24

25 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

26 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

27 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

28 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

29 Using psrun to obtain counts psrun cmd (e.g. psrun du -s /usr) This test will measure performance counters used by the ls command. No special compilation of ls is required for this to work. psprocess cmd.* (e.g. psprocess du.*.xml) At the bottom of this file, you will see summary events about numerous counters. psconfig create a custom set of events for psrun to use

30 Demo of psrun

31 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 31

32 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

33 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

34 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

35 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

36 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

37 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

38 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

39 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

40 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

41 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

42 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

43 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

44 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

45 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

46 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

47 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

48 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

49 By hand: Verifying the PAPI Version // When hand-instrumenting gyou need to check #include <papi.h>... /* Verifying PAPI Version */ int v = PAPI_library_init(PAPI_VER_CURRENT); if(v == -1) { } fprintf(stderr,"bad PAPI version\n"); exit(2);

50 By Hand: Measuring PAPI Counters Use "papi_availavail -a" to identify counters #include "papi.h" #define NUM 3 int events[num] ={ PAPI_FP_OPS, PAPI_TOT_INS, PAPI_L1_DCM}; Link with -lpapi int main(int argc,char *argv) { int i; int r; long_long values[num]; r=papi_start_counters(events,num);... } r=papi_stop_counters(values,num); printf("end ret=%d\n",r); for(i=0;i<num;i++) { printf("ctr[%d]: %f\n",i, (double)values[i]); }

51 Demo: Hand instrumentation with PAPI

52 Statistical profiling profil() - Unix command to examine program to periodically examine program counter. Identify subroutines where code spends most time. Used by Gprof PAPI_profil() - Emulates profil(), but looks at a specific hardware counter. Identifies file/line where code spends most time.

53 Using psrun to find hot spots gcc -g -o cmd cmd.c psrun -C -c papi_profile_cycles.xml cmd "-C" Instructs papi to use xml configurations that are in the install path rather than current directory. "-c papi_profile_cycles.xml" Use the named config file rather than the default. "papi_profile_cycles.xml" directs papi to collect file/line data. psprocess cmd.*.xml d* display results

54 Demo : 2 nd Demo of psrun

55 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 55

56 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

57 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

58 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

59 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

60 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

61 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

62 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

63 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

64 Measuring PAPI Counters with TAU Set up environment export COUNTER1=GET_TIME_OF_DAY export COUNTER2=PAPI_FP_OPS source /home/packages/tau/gcc-papi/env.sh Compile with special TAU compiler: e.g. tau_cc.sh cmd.c Run your code Use pprof to read trace files: profile.*

65 More TAU options... Diagnostic: export TAU_OPTIONS=-optKeepFiles Examine instrumented code (if you want to): eg. tau_cc.sh cmd.c vi cmd.inst.c Throttling: export TAU_THROTTLE=1 export TAU_THROTTLE_NUMCALLS= export TAU_THROTTLE_PERCALL=3000 Exploring Data Graphically: Download files to your PC Run ParaProf from here:

66 Demo of TAU

67 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 67

68 Summary Material for the Test Performance & cpi: slide 8

69 69

Checking Resource Usage in Fedora (Linux)

Checking Resource Usage in Fedora (Linux) Lab 5C Checking Resource Usage in Fedora (Linux) Objective In this exercise, the student will learn how to check the resources on a Fedora system. This lab covers the following commands: df du top Equipment

More information

Profiling with TAU. Le Yan. User Services LSU 2/15/2012

Profiling with TAU. Le Yan. User Services LSU 2/15/2012 Profiling with TAU Le Yan User Services HPC @ LSU Feb 13-16, 2012 1 Three Steps of Code Development Debugging Make sure the code runs and yields correct results Profiling Analyze the code to identify performance

More information

Profiling with TAU. Le Yan. 6/6/2012 LONI Parallel Programming Workshop

Profiling with TAU. Le Yan. 6/6/2012 LONI Parallel Programming Workshop Profiling with TAU Le Yan 6/6/2012 LONI Parallel Programming Workshop 2012 1 Three Steps of Code Development Debugging Make sure the code runs and yields correct results Profiling Analyze the code to identify

More information

MPI Performance Tools

MPI Performance Tools Physics 244 31 May 2012 Outline 1 Introduction 2 Timing functions: MPI Wtime,etime,gettimeofday 3 Profiling tools time: gprof,tau hardware counters: PAPI,PerfSuite,TAU MPI communication: IPM,TAU 4 MPI

More information

Tools and techniques for optimization and debugging. Fabio Affinito October 2015

Tools and techniques for optimization and debugging. Fabio Affinito October 2015 Tools and techniques for optimization and debugging Fabio Affinito October 2015 Profiling Why? Parallel or serial codes are usually quite complex and it is difficult to understand what is the most time

More information

Performance Tools. Tulin Kaman. Department of Applied Mathematics and Statistics

Performance Tools. Tulin Kaman. Department of Applied Mathematics and Statistics Performance Tools Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook/BNL New York Center for Computational Science tkaman@ams.sunysb.edu Aug 23, 2012 Do you have information on exactly

More information

TAU 2.19 Quick Reference

TAU 2.19 Quick Reference What is TAU? The TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. It comprises 3 main units: Instrumentation,

More information

Computer Organization: A Programmer's Perspective

Computer Organization: A Programmer's Perspective Profiling Oren Kapah orenkapah.ac@gmail.com Profiling: Performance Analysis Performance Analysis ( Profiling ) Understanding the run-time behavior of programs What parts are executed, when, for how long

More information

Profiling and debugging. Carlos Rosales September 18 th 2009 Texas Advanced Computing Center The University of Texas at Austin

Profiling and debugging. Carlos Rosales September 18 th 2009 Texas Advanced Computing Center The University of Texas at Austin Profiling and debugging Carlos Rosales carlos@tacc.utexas.edu September 18 th 2009 Texas Advanced Computing Center The University of Texas at Austin Outline Debugging Profiling GDB DDT Basic use Attaching

More information

PAPI - PERFORMANCE API. ANDRÉ PEREIRA

PAPI - PERFORMANCE API. ANDRÉ PEREIRA PAPI - PERFORMANCE API ANDRÉ PEREIRA ampereira@di.uminho.pt 1 Motivation Application and functions execution time is easy to measure time gprof valgrind (callgrind) It is enough to identify bottlenecks,

More information

CSE 141 Summer 2016 Homework 2

CSE 141 Summer 2016 Homework 2 CSE 141 Summer 2016 Homework 2 PID: Name: 1. A matrix multiplication program can spend 10% of its execution time in reading inputs from a disk, 10% of its execution time in parsing and creating arrays

More information

PAPI - PERFORMANCE API. ANDRÉ PEREIRA

PAPI - PERFORMANCE API. ANDRÉ PEREIRA PAPI - PERFORMANCE API ANDRÉ PEREIRA ampereira@di.uminho.pt 1 Motivation 2 Motivation Application and functions execution time is easy to measure time gprof valgrind (callgrind) 2 Motivation Application

More information

07 - Processes and Jobs

07 - Processes and Jobs 07 - Processes and Jobs CS 2043: Unix Tools and Scripting, Spring 2016 [1] Stephen McDowell February 10th, 2016 Cornell University Table of contents 1. Processes Overview 2. Modifying Processes 3. Jobs

More information

CIS 403: Lab 6: Profiling and Tuning

CIS 403: Lab 6: Profiling and Tuning CIS 403: Lab 6: Profiling and Tuning Getting Started 1. Boot into Linux. 2. Get a copy of RAD1D from your CVS repository (cvs co RAD1D) or download a fresh copy of the tar file from the course website.

More information

CS 403: Lab 6: Profiling and Tuning

CS 403: Lab 6: Profiling and Tuning CS 403: Lab 6: Profiling and Tuning Getting Started 1. Boot into Linux. 2. Get a copy of RAD1D from your CVS repository (cvs co RAD1D) or download a fresh copy of the tar file from the course website.

More information

ECE 454 Computer Systems Programming Measuring and profiling

ECE 454 Computer Systems Programming Measuring and profiling ECE 454 Computer Systems Programming Measuring and profiling Ding Yuan ECE Dept., University of Toronto http://www.eecg.toronto.edu/~yuan It is a capital mistake to theorize before one has data. Insensibly

More information

Tau Introduction. Lars Koesterke (& Kent Milfeld, Sameer Shende) Cornell University Ithaca, NY. March 13, 2009

Tau Introduction. Lars Koesterke (& Kent Milfeld, Sameer Shende) Cornell University Ithaca, NY. March 13, 2009 Tau Introduction Lars Koesterke (& Kent Milfeld, Sameer Shende) Cornell University Ithaca, NY March 13, 2009 General Outline Measurements Instrumentation & Control Example: matmult Profiling and Tracing

More information

Overview. Timers. Profilers. HPM Toolkit

Overview. Timers. Profilers. HPM Toolkit Overview Timers Profilers HPM Toolkit 2 Timers Wide range of timers available on the HPCx system Varying precision portability language ease of use 3 Timers Timer Usage Wallclock/C PU Resolution Language

More information

Parallelism V. HPC Profiling. John Cavazos. Dept of Computer & Information Sciences University of Delaware

Parallelism V. HPC Profiling. John Cavazos. Dept of Computer & Information Sciences University of Delaware Parallelism V HPC Profiling John Cavazos Dept of Computer & Information Sciences University of Delaware Lecture Overview Performance Counters Profiling PAPI TAU HPCToolkit PerfExpert Performance Counters

More information

Performance Analysis of Parallel Scientific Applications In Eclipse

Performance Analysis of Parallel Scientific Applications In Eclipse Performance Analysis of Parallel Scientific Applications In Eclipse EclipseCon 2015 Wyatt Spear, University of Oregon wspear@cs.uoregon.edu Supercomputing Big systems solving big problems Performance gains

More information

The Art of Debugging: How to think like a programmer. Melissa Sulprizio GEOS-Chem Support Team

The Art of Debugging: How to think like a programmer. Melissa Sulprizio GEOS-Chem Support Team The Art of Debugging: How to think like a programmer Melissa Sulprizio GEOS-Chem Support Team geos-chem-support@as.harvard.edu Graduate Student Forum 23 February 2017 GEOS-Chem Support Team Bob Yantosca

More information

Profilers and performance evaluation. Tools and techniques for performance analysis Andrew Emerson

Profilers and performance evaluation. Tools and techniques for performance analysis Andrew Emerson Profilers and performance evaluation Tools and techniques for performance analysis Andrew Emerson 10/06/2016 Tools and Profilers, Summer School 2016 1 Contents Motivations Manual Methods Measuring execution

More information

Performance Improvement. The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7

Performance Improvement. The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7 Performance Improvement The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7 1 For Your Amusement Optimization hinders evolution. -- Alan Perlis

More information

Sperimentazioni I LINUX commands tutorial - Part II

Sperimentazioni I LINUX commands tutorial - Part II Sperimentazioni I LINUX commands tutorial - Part II A. Garfagnini, M. Mazzocco Università degli studi di Padova 24 Ottobre 2012 Streams and I/O Redirection Pipelines Create, monitor and kill processes

More information

TAUdb: PerfDMF Refactored

TAUdb: PerfDMF Refactored TAUdb: PerfDMF Refactored Kevin Huck, Suzanne Millstein, Allen D. Malony and Sameer Shende Department of Computer and Information Science University of Oregon PerfDMF Overview Performance Data Management

More information

Evaluating Performance Via Profiling

Evaluating Performance Via Profiling Performance Engineering of Software Systems September 21, 2010 Massachusetts Institute of Technology 6.172 Professors Saman Amarasinghe and Charles E. Leiserson Handout 6 Profiling Project 2-1 Evaluating

More information

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including

More information

Performance Optimization: Simulation and Real Measurement

Performance Optimization: Simulation and Real Measurement Performance Optimization: Simulation and Real Measurement KDE Developer Conference, Introduction Agenda Performance Analysis Profiling Tools: Examples & Demo KCachegrind: Visualizing Results What s to

More information

Prof. Thomas Sterling

Prof. Thomas Sterling High Performance Computing: Concepts, Methods & Means Performance 3 : Measurement Prof. Thomas Sterling Department of Computer Science Louisiana i State t University it February 27 th, 2007 Term Projects

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

Dresden, September Dan Terpstra Jack Dongarra Shirley Moore. Heike Jagode

Dresden, September Dan Terpstra Jack Dongarra Shirley Moore. Heike Jagode Collecting Performance Data with PAPI-C 3rd Parallel Tools Workshop 3rd Parallel Tools Workshop Dresden, September 14-15 Dan Terpstra Jack Dongarra Shirley Moore Haihang You Heike Jagode Hardware performance

More information

top - 14:43:26 up 25 days, 3:46, 50 users, load average: 0.04, 0.05, 0.01 Tasks: 1326 total, 1 running, 1319 sleeping, 2 stopped, 4 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si,

More information

CS3350B Computer Architecture CPU Performance and Profiling

CS3350B Computer Architecture CPU Performance and Profiling CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada

More information

TAU by example - Mpich

TAU by example - Mpich TAU by example From Mpich TAU (Tuning and Analysis Utilities) is a toolkit for profiling and tracing parallel programs written in C, C++, Fortran and others. It supports dynamic (librarybased), compiler

More information

ClearSpeed Visual Profiler

ClearSpeed Visual Profiler ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are

More information

APACHE TROUBLESHOOTING. Or, what to do when your vhost won t behave

APACHE TROUBLESHOOTING. Or, what to do when your vhost won t behave APACHE TROUBLESHOOTING Or, what to do when your vhost won t behave ABOUT THE CLASS 24 hours over three days Very Short Lecture and Lots of Labs Hours: 8:30am - 5:00pm Lunch: 11:45am - 1:00pm ABOUT THE

More information

Section 1: Tools. Contents CS162. January 19, Make More details about Make Git Commands to know... 3

Section 1: Tools. Contents CS162. January 19, Make More details about Make Git Commands to know... 3 CS162 January 19, 2017 Contents 1 Make 2 1.1 More details about Make.................................... 2 2 Git 3 2.1 Commands to know....................................... 3 3 GDB: The GNU Debugger

More information

CRUK cluster practical sessions (SLURM) Part I processes & scripts

CRUK cluster practical sessions (SLURM) Part I processes & scripts CRUK cluster practical sessions (SLURM) Part I processes & scripts login Log in to the head node, clust1-headnode, using ssh and your usual user name & password. SSH Secure Shell 3.2.9 (Build 283) Copyright

More information

Performance Profiling

Performance Profiling Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance

More information

SCIENTIFIC COMPUTING FOR ENGINEERS

SCIENTIFIC COMPUTING FOR ENGINEERS 4/26/16 CS 594: SCIENTIFIC COMPUTING FOR ENGINEERS PAPI Performance Application Programming Interface Heike Jagode jagode@icl.utk.edu OUTLINE 1. Motivation What is Performance? Why being annoyed with Performance

More information

ARCHER Single Node Optimisation

ARCHER Single Node Optimisation ARCHER Single Node Optimisation Profiling Slides contributed by Cray and EPCC What is profiling? Analysing your code to find out the proportion of execution time spent in different routines. Essential

More information

Performance analysis basics

Performance analysis basics Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis

More information

Profiling and Workflow

Profiling and Workflow Profiling and Workflow Preben N. Olsen University of Oslo and Simula Research Laboratory preben@simula.no September 13, 2013 1 / 34 Agenda 1 Introduction What? Why? How? 2 Profiling Tracing Performance

More information

Performance Analysis and Debugging Tools

Performance Analysis and Debugging Tools Performance Analysis and Debugging Tools Performance analysis and debugging intimately connected since they both involve monitoring of the software execution. Just different goals: Debugging -- achieve

More information

Course web site: teaching/courses/car. Piazza discussion forum:

Course web site:   teaching/courses/car. Piazza discussion forum: Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start

More information

Kernels & Processes The Structure of the Operating System

Kernels & Processes The Structure of the Operating System COMP 111: Operating Systems (Fall 2013) Kernels & Processes The Structure of the Operating System Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah Based on a

More information

Performance, Power, Die Yield. CS301 Prof Szajda

Performance, Power, Die Yield. CS301 Prof Szajda Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the

More information

Detection and Analysis of Iterative Behavior in Parallel Applications

Detection and Analysis of Iterative Behavior in Parallel Applications Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University

More information

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino Performance analysis tools: Intel VTuneTM Amplifier and Advisor Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimisation After having considered the MPI layer,

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 16

ECE 571 Advanced Microprocessor-Based Design Lecture 16 ECE 571 Advanced Microprocessor-Based Design Lecture 16 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 21 March 2013 Project Reminder Topic Selection by Tuesday (March 26) Once

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring 2009 Topic Notes: C and Unix Overview This course is about computer organization, but since most of our programming is

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

Portable Power/Performance Benchmarking and Analysis with WattProf

Portable Power/Performance Benchmarking and Analysis with WattProf Portable Power/Performance Benchmarking and Analysis with WattProf Amir Farzad, Boyana Norris University of Oregon Mohammad Rashti RNET Technologies, Inc. Motivation Energy efficiency is becoming increasingly

More information

The PAPI Cross-Platform Interface to Hardware Performance Counters

The PAPI Cross-Platform Interface to Hardware Performance Counters The PAPI Cross-Platform Interface to Hardware Performance Counters Kevin London, Shirley Moore, Philip Mucci, and Keith Seymour University of Tennessee-Knoxville {london, shirley, mucci, seymour}@cs.utk.edu

More information

Performance Analysis of KDD Applications using Hardware Event Counters. CAP Theme 2.

Performance Analysis of KDD Applications using Hardware Event Counters. CAP Theme 2. Performance Analysis of KDD Applications using Hardware Event Counters CAP Theme 2 http://cap.anu.edu.au/cap/projects/kddmemperf/ Peter Christen and Adam Czezowski Peter.Christen@anu.edu.au Adam.Czezowski@anu.edu.au

More information

CPSC 457 OPERATING SYSTEMS MIDTERM EXAM

CPSC 457 OPERATING SYSTEMS MIDTERM EXAM CPSC 457 OPERATING SYSTEMS MIDTERM EXAM Department of Computer Science University of Calgary Professor: Carey Williamson March 9, 2010 This is a CLOSED BOOK exam. Textbooks, notes, laptops, calculators,

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

What SMT can do for You. John Hague, IBM Consultant Oct 06

What SMT can do for You. John Hague, IBM Consultant Oct 06 What SMT can do for ou John Hague, IBM Consultant Oct 06 100.000 European Centre for Medium Range Weather Forecasting (ECMWF): Growth in HPC performance 10.000 teraflops sustained 1.000 0.100 0.010 VPP700

More information

Profiling and Debugging Tools. Lars Koesterke University of Porto, Portugal May 28-29, 2009

Profiling and Debugging Tools. Lars Koesterke University of Porto, Portugal May 28-29, 2009 Profiling and Debugging Tools Lars Koesterke University of Porto, Portugal May 28-29, 2009 Outline General (Analysis Tools) Listings & Reports Timers Profilers (gprof, tprof, Tau) Hardware performance

More information

ECE 695 Numerical Simulations Lecture 3: Practical Assessment of Code Performance. Prof. Peter Bermel January 13, 2017

ECE 695 Numerical Simulations Lecture 3: Practical Assessment of Code Performance. Prof. Peter Bermel January 13, 2017 ECE 695 Numerical Simulations Lecture 3: Practical Assessment of Code Performance Prof. Peter Bermel January 13, 2017 Outline Time Scaling Examples General performance strategies Computer architectures

More information

Performance measurements of computer systems: tools and analysis

Performance measurements of computer systems: tools and analysis Performance measurements of computer systems: tools and analysis M2R PDES Jean-Marc Vincent and Arnaud Legrand Laboratory LIG MESCAL Project Universities of Grenoble {Jean-Marc.Vincent,Arnaud.Legrand}@imag.fr

More information

Introduction to Performance Tuning & Optimization Tools

Introduction to Performance Tuning & Optimization Tools Introduction to Performance Tuning & Optimization Tools a[i] a[i+1] + a[i+2] a[i+3] b[i] b[i+1] b[i+2] b[i+3] = a[i]+b[i] a[i+1]+b[i+1] a[i+2]+b[i+2] a[i+3]+b[i+3] Ian A. Cosden, Ph.D. Manager, HPC Software

More information

ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών. Εργαστήριο Αρ. 4. Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot

ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών. Εργαστήριο Αρ. 4. Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών Εργαστήριο Αρ. 4 Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot Lecturer: Zacharias Hadjilambrou Σελ. 1 Realtime monitoring

More information

HPC Lab. Session 4: Profiler. Sebastian Rettenberger, Chaulio Ferreira, Michael Bader. November 9, 2015

HPC Lab. Session 4: Profiler. Sebastian Rettenberger, Chaulio Ferreira, Michael Bader. November 9, 2015 HPC Lab Session 4: Profiler Sebastian Rettenberger, Chaulio Ferreira, Michael Bader November 9, 2015 Session 4: Profiler, November 9, 2015 1 Profiler Profiling allows you to learn where your program spent

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

Performance measurements of computer systems: tools and analysis

Performance measurements of computer systems: tools and analysis Performance measurements of computer systems: tools and analysis M2R PDES Jean-Marc Vincent and Arnaud Legrand Laboratory LIG MESCAL Project Universities of Grenoble {Jean-Marc.Vincent,Arnaud.Legrand}@imag.fr

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

Managing Processes Process: A running program

Managing Processes Process: A running program Managing Processes Process: A running program User Process: The process initiated by a User while logged into a terminal (e.g. grep, find, ls) Daemon Process: These processes are usually initiated on system

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 2

ECE 571 Advanced Microprocessor-Based Design Lecture 2 ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 January 2016 Announcements HW#1 will be posted tomorrow I am handing out

More information

PAPI: Performance API

PAPI: Performance API Santiago 2015 PAPI: Performance API Andrés Ávila Centro de Modelación y Computación Científica Universidad de La Frontera andres.avila@ufrontera.cl October 27th, 2015 1 Motivation 2 Motivation PERFORMANCE

More information

COMP s1 Lecture 1

COMP s1 Lecture 1 COMP1511 18s1 Lecture 1 1 Numbers In, Numbers Out Andrew Bennett more printf variables scanf 2 Before we begin introduce yourself to the person sitting next to you why did

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Debugging and Profiling

Debugging and Profiling Debugging and Profiling Dr. Axel Kohlmeyer Senior Scientific Computing Expert Information and Telecommunication Section The Abdus Salam International Centre for Theoretical Physics http://sites.google.com/site/akohlmey/

More information

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview Computer Science 322 Operating Systems Mount Holyoke College Spring 2010 Topic Notes: C and Unix Overview This course is about operating systems, but since most of our upcoming programming is in C on a

More information

CSE 303: Concepts and Tools for Software Development

CSE 303: Concepts and Tools for Software Development CSE 303: Concepts and Tools for Software Development Dan Grossman Spring 2007 Lecture 19 Profiling (gprof); Linking and Libraries Dan Grossman CSE303 Spring 2007, Lecture 19 1 Where are we Already started

More information

Performance Tuning VTune Performance Analyzer

Performance Tuning VTune Performance Analyzer Performance Tuning VTune Performance Analyzer Paul Petersen, Intel Sept 9, 2005 Copyright 2005 Intel Corporation Performance Tuning Overview Methodology Benchmarking Timing VTune Counter Monitor Call Graph

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 4

ECE 571 Advanced Microprocessor-Based Design Lecture 4 ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted

More information

Improving the Performance of your LabVIEW Applications

Improving the Performance of your LabVIEW Applications Improving the Performance of your LabVIEW Applications 1 Improving Performance in LabVIEW Purpose of Optimization Profiling Tools Memory Optimization Execution Optimization 2 Optimization Cycle Benchmark

More information

Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015

Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015 Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015 Oct 1: Introduction to OpenACC Oct 6: Office Hours Oct 15: Profiling and Parallelizing with the OpenACC Toolkit

More information

Introduction to Performance Engineering

Introduction to Performance Engineering Introduction to Performance Engineering Markus Geimer Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance: an old problem

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 2

ECE 571 Advanced Microprocessor-Based Design Lecture 2 ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 4 September 2014 Announcements HW#1 delayed until Tuesday 1 Hardware Performance

More information

Profiling: Understand Your Application

Profiling: Understand Your Application Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel

More information

Profiling & Optimization

Profiling & Optimization Lecture 18 Sources of Game Performance Issues? 2 Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes

More information

ECE 574 Cluster Computing Lecture 4

ECE 574 Cluster Computing Lecture 4 ECE 574 Cluster Computing Lecture 4 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 31 January 2017 Announcements Don t forget about homework #3 I ran HPCG benchmark on Haswell-EP

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) represents an important

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Sample Midterm I Questions Israel Koren ECE568/Koren Sample Midterm.1.1 1. The cost of a pipeline can

More information

Method-Level Phase Behavior in Java Workloads

Method-Level Phase Behavior in Java Workloads Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS

More information

CS / ECE 6810 Midterm Exam - Oct 21st 2008

CS / ECE 6810 Midterm Exam - Oct 21st 2008 Name and ID: CS / ECE 6810 Midterm Exam - Oct 21st 2008 Notes: This is an open notes and open book exam. If necessary, make reasonable assumptions and clearly state them. The only clarifications you may

More information

Mon Sep 17, 2007 Lecture 3: Process Management

Mon Sep 17, 2007 Lecture 3: Process Management Mon Sep 17, 2007 Lecture 3: Process Management September 19, 2007 1 Review OS mediates between hardware and user software QUIZ: Q: Name three layers of a computer system where the OS is one of these layers.

More information

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang

More information

CIT 470: Advanced Network and System Administration. Topics. What is performance testing? Performance Monitoring

CIT 470: Advanced Network and System Administration. Topics. What is performance testing? Performance Monitoring CIT 470: Advanced Network and System Administration Performance Monitoring CIT 470: Advanced Network and System Administration Slide #1 Topics 1. Performance testing 2. Performance tuning. 3. CPU 4. Memory

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Performance 2005-2-8 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time: Tips

More information

Performance Analysis. HPC Fall 2007 Prof. Robert van Engelen

Performance Analysis. HPC Fall 2007 Prof. Robert van Engelen Performance Analysis HPC Fall 2007 Prof. Robert van Engelen Overview What to measure? Timers Benchmarking Profiling Finding hotspots Profile-guided compilation Messaging and network performance analysis

More information

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,

More information

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture

More information

Introduction to Parallel Performance Engineering

Introduction to Parallel Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:

More information

How much energy can you save with a multicore computer for web applications?

How much energy can you save with a multicore computer for web applications? How much energy can you save with a multicore computer for web applications? Peter Strazdins Computer Systems Group, Department of Computer Science, The Australian National University seminar at Green

More information