Prof. Thomas Sterling

Size: px

Start display at page:

Download "Prof. Thomas Sterling"

Adam Carpenter
6 years ago
Views:

1 High Performance Computing: Concepts, Methods & Means Performance Measurement 1 Prof. Thomas Sterling Department of Computer Science Louisiana i State t University it February 13 th, 2007

2 News Alert! Intel announces Teraflops Research Chip 80 cores 1.8 Teraflops 5.6 GHz) 60 watts 1.01 Tflops) At ISSCC Mesh network on a chip

3 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 3

4 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 4

5 Opening Remarks Up until now, 2 strategies for measuring performance: 1) wall-clock time for user applications 2) benchmarks for comparing Machines of different type Machines of different scale But, we have identified factors that contribute to system operational performance, e.g.: Effective use of parallelism Cache behavior To make better use of HPC systems, need to measure operational behavior How the system is performing during application execution What are the application demands and bottlenecks Focus on SMP class system operation during this Segment Next Segment: measuring MPP & cluster behavior

6 What you ll Need to Know This is a skills-oriented lecture Understand the kinds and levels of metrics of system and processor operation that you can measure Know the kinds of tools that can expose valuable parameters of system & application operation Hardware counters Software instrumentation, data acquisition, and presentation Learn the basics of how to use specific tools when running your application code Gproff Perfsuite PAPI TAU

7 Final initial comments (yes, I know that s an oxymoron) We are only going to scratch the surface today Try to get the basic ideas This will expose you to a range of concepts, strategies, and tools Lots of details will be left to future discussions Over the next weeks, we will extend our abilities in using these tools But don t hesitate to read through the documentation Hey, try some things out for yourself You ve got a sandbox to play in (Celeritas)

8 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 8

9 Hardware Counters MP MP MP MP L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 M1 M1 M n-1 Each processor has the ability to monitor events of various kinds Small set of registers used to count events. Very processor specific. (some nice graphic here Schematic of CPU, ME) NIC Controller NIC S S PCI-e JTAG Ethernet Peripherals USB

10 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

11 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

12 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

13 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

14 Hardware Events Floating point operations, Multiplies, Adds, Multiply-Adds, etc. L1/L2 cache hits/misses (see Translation Lookaside Buffer hits/misses i (virtual to physical address translation table) Branch prediction counters (pipelined systems must guess the next instruction to fetch)

15 A Goal: Optimization Compile Time: Various levels enabled by compiler options Examine Compiler Output Run Time (Performance Analysis): Instrument code or execution to produce a trace Tools to analyze trace: Standard/basic tool is gprof, but there are many others Note: Java Hot-Spot environment collects data about execution and uses it to optimize a program as it runs

16 Performance Analysis Tools Widely Ported Low-Level Interface to hardware counters: PAPI (Performance API) Many tools built on PAPI Perfsuite (NCSA), psrun command TAU (University of Oregon) etc. etc. Useful for: Finding performance bottlenecks Identifying cache problems (badly sized arrays)

17 time A simple Unix command to give resource usage. Runs a specified program time [options] command [arguments ] Gives timing statistics about program run The elapsed real time between invocation and termination User CPU time S t CPU ti System CPU time See: man time

18 top Gives an overview of system process status and resource usage Provides a dynamic realtime view of a running system System summary information Currently managed tasks Updates every few (e.g. 5) seconds top hv -bciss d delay n iterations p pid [, pid ] See: man top

19 Basic Tools Time $ time du -s /usr > /dev/null 2>&1 top/ps real 0m34.274s user 0m0.082s sys 0m0.957s top - 11:29:40 up 49 min, 2 users, load average: 0.32, 0.26, 0.25 Tasks: 125 total, 3 running, 121 sleeping, 0 stopped, 1 zombie Cpu(s): 4.5%us, 0.3%sy, 0.0%ni, 94.7%id, 0.2%wa, 0.3%hi, 0.0%si, 0.0%st Mem: k total, k used, 17564k free, k buffers Swap: k total, 32k used, k free, k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4136 sbrandt m 10m S :03.35 gnome-terminal 3761 root m 12m R :02.82 X 5195 sbrandt R :00.03 top 3487 root S :00.25 hald-addon-stor 3930 sbrandt m 40m 14m S :36.27 beagled

20 Topics Introduction Measuring System Operation gprof Perfsuite PAPI Tau Summary Material for the Test 20

21 gprof : quick overview gprof a utility which profiles procedures in programs, available in most Unix systems. gprof provides information about : An index for each procedure Parent of each procedures The percentage of CPU time utilized by a procedure and its calls. Breakdown of time used by the procedure and its descendents Number of times a procedure was called. direct descendents of each procedure To use gprof : compile the source code with a pg option running the executable created generates an output file gmon.out for serial programs, and gmon.out.0, gmon.out.1 etc. For serial programs : gprof exe gmon.out For parallel programs : gprof exe gmon.out.*

22 GPROF: one minute tutorial Steps to use gprof: gcc -pg -g -o prog prog.c./prog gprof prog gmon.out More reading: Finds subroutines where the most time is spent Cannot tell you why some routines are more costly than others. Need more information...

23 Demo of gprof

24 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 24

25 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

26 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

27 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

28 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

29 Using psrun to obtain counts psrun cmd (e.g. psrun du -s /usr) This test will measure performance counters used by the ls command. No special compilation of ls is required for this to work. psprocess cmd.* (e.g. psprocess du.*.xml) At the bottom of this file, you will see summary events about numerous counters. psconfig create a custom set of events for psrun to use

30 Demo of psrun

31 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 31

32 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

33 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

34 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

35 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

36 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

37 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

38 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

39 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

40 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

41 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

42 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

43 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

44 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

45 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

46 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

47 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

48 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

49 By hand: Verifying the PAPI Version // When hand-instrumenting gyou need to check #include <papi.h>... /* Verifying PAPI Version */ int v = PAPI_library_init(PAPI_VER_CURRENT); if(v == -1) { } fprintf(stderr,"bad PAPI version\n"); exit(2);

50 By Hand: Measuring PAPI Counters Use "papi_availavail -a" to identify counters #include "papi.h" #define NUM 3 int events[num] ={ PAPI_FP_OPS, PAPI_TOT_INS, PAPI_L1_DCM}; Link with -lpapi int main(int argc,char *argv) { int i; int r; long_long values[num]; r=papi_start_counters(events,num);... } r=papi_stop_counters(values,num); printf("end ret=%d\n",r); for(i=0;i<num;i++) { printf("ctr[%d]: %f\n",i, (double)values[i]); }

51 Demo: Hand instrumentation with PAPI

52 Statistical profiling profil() - Unix command to examine program to periodically examine program counter. Identify subroutines where code spends most time. Used by Gprof PAPI_profil() - Emulates profil(), but looks at a specific hardware counter. Identifies file/line where code spends most time.

53 Using psrun to find hot spots gcc -g -o cmd cmd.c psrun -C -c papi_profile_cycles.xml cmd "-C" Instructs papi to use xml configurations that are in the install path rather than current directory. "-c papi_profile_cycles.xml" Use the named config file rather than the default. "papi_profile_cycles.xml" directs papi to collect file/line data. psprocess cmd.*.xml d* display results

54 Demo : 2 nd Demo of psrun

55 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 55

56 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

57 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

58 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

59 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

60 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

61 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

62 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

63 Philip J. Mucci, Performance Analysis Tools and PAPI UTK ICL

64 Measuring PAPI Counters with TAU Set up environment export COUNTER1=GET_TIME_OF_DAY export COUNTER2=PAPI_FP_OPS source /home/packages/tau/gcc-papi/env.sh Compile with special TAU compiler: e.g. tau_cc.sh cmd.c Run your code Use pprof to read trace files: profile.*

65 More TAU options... Diagnostic: export TAU_OPTIONS=-optKeepFiles Examine instrumented code (if you want to): eg. tau_cc.sh cmd.c vi cmd.inst.c Throttling: export TAU_THROTTLE=1 export TAU_THROTTLE_NUMCALLS= export TAU_THROTTLE_PERCALL=3000 Exploring Data Graphically: Download files to your PC Run ParaProf from here:

66 Demo of TAU

67 Topics Introduction Measuring System Operation Gprof Perfsuite PAPI Tau Summary Material for the Test 67

68 Summary Material for the Test Performance & cpi: slide 8

69 69

Checking Resource Usage in Fedora (Linux)

Lab 5C Checking Resource Usage in Fedora (Linux) Objective In this exercise, the student will learn how to check the resources on a Fedora system. This lab covers the following commands: df du top Equipment