Projections A Performance Tool for Charm++ Applications

Size: px

Start display at page:

Download "Projections A Performance Tool for Charm++ Applications"

Rachel Dickerson
5 years ago
Views:

1 Projections A Performance Tool for Charm++ Applications Chee Wai Lee cheelee@uiuc.edu Parallel Programming Laboratory Dept. of Computer Science University of Illinois at Urbana Champaign

2 Outline General Introduction to Projections Projections Basics Advanced Features Features to aid Effective Analysis Extremely Large Datasets Tips and Notes

3 Projections Projections is a performance tool designed for use with Charm++/AMPI. Trace-based, post-mortem analysis. Supports highly detailed traces, summary formats and a flexible user-level API. Java-based visualization tool for presenting performance information. 10/19/05 Projections Tutorial 3

4 Charm++ Model Object Oriented. Chares (objects) encapsulate data, standard C++ methods and entry methods. Message Driven. Entry methods represent work units activated by an incoming message. Only one entry method may execute at any time in a Chare. A runtime schedules incoming messages. 10/19/05 Projections Tutorial 4

5 What you will need A version of Charm++ built without the CMK_OPTIMIZE flag. User-built, download our release or check out a copy from anonymous CVS. Pre-built, check with your machine sysadmin. Java Runtime or higher. Projections Visualization binary (projections.jar) User-built Charm++, located in charm/tools/projections/bin Pre-built, check with sysadmin or acquire the binary separately.

6 Outline: Basics General Introduction to Projections Projections Basics Instrumentation Trace generation Visualization Advanced Features Features to aid Effective Analysis Extremely Large Datasets

7 Trace Generation: Basics Automatic trace instrumentation - No user codes required by default. Any Charm++ version built without the CMK_OPTIMIZE flag supports tracing. All Charm++ entry methods and messaging events are traced.

8 Trace Generation: Steps Link your application with link-time options -tracemode projections -tracemode summary Run your application normally. At the end of the run, you will see.log,.sum as well as.sts files generated on the same directory as your application binary.

9 Visualization: Basic Steps Run the script found at charm/tools/projections/bin as such: projections [<application>.sts] Or activate the Java binary projections.jar via the command-line passing, in the optional <application>.sts filename as argument.

10 Visualization: Main Window 10/19/05 Projections Tutorial 10

11 Visualization: Overview

12 Visualization: Usage Profile

13 Visualization: Time Profile

14 Visualization: Timeline

15 Visualization: Task Histogram

16 Visualization: Communication

17 Visualization: Tabulated Call Info

18 Projections Basic demo. 10/19/05 Projections Tutorial 18

19 Outline: Advanced Features General Introduction to Projections Projections Basics Advanced Features Partial Tracing Tracing User Events Tracing AMPI Functions Features to aid Effective Analysis Extremely Large Datasets Tips and Notes

20 Partial Trace Generation The following API calls are provided by the tracing framework: void tracebegin() void traceend() The above calls turns tracing on/off for the processor on which the call was made. int traceison() queries the tracing framework status. +traceoff runtime option Causes tracing (over all processors) to be turned off when the application is started. 10/19/05 Projections Tutorial 20

21 Partial Tracing watch-it s tracebegin() and traceend() calls apply only on the processor on which it invoked. Offers flexibility but vulnerable to programmer error. Typically used in a collective manner. (eg. NAMD does this at specific load balancing operations) Partial trace calls are invoked in the context of an entry method. One should be prepared to drop initial performance data just after turning tracing on and just before turning tracing off. Appropriate use of the +traceoff runtime option is essential.

22 Partial Tracing Example // in the case when trace is off at the beginning, // only turn trace of from after the first LB to the firstldbstep after // the second LB. // // off on Alg7 refine refine... on #if CHARM_VERSION >= if (traceavailable()) { static int specialtracing = 0; if (ldbcyclenum == 1 && traceison() == 0) specialtracing = 1; if (specialtracing) { if (ldbcyclenum == 4) tracebegin(); if (ldbcyclenum == 6) traceend(); } } #endif

23 Tracing User Events The following APIs are provided for user event registration and tracing: int traceregisteruserevent(char *eventdesc, int EventNum=-1) Acquire or specify an event ID to be associated with event name. void traceuserevent(int eventnum) Use a valid event ID to record an event. void traceuserbracketevent(int eventnum, double starttime, double endtime) Use a valid event ID to record an event interval. 10/19/05 Projections Tutorial 23

24 AMPI Function Tracing API Works like User Events in Charm++ Applications. Why? MPI Function abstraction is invisible to the Charm++ runtime. REGISTER_FUNCTION(<namestring>) to register <namestring> as a string-id to be traced. TRACEFUNC(<funcall>,<namestring>) to record a call to function <funcall> to be associated with the event registered as <namestring>.

25 Advanced Features Demo

26 Outline: Effective Analysis General Introduction to Projections Projections Basics Advanced Features Features to aid Effective Analysis How Tracing Works Memory Footprint Control Data Volume Control Visualization Controls Extremely Large Datasets Tips and Notes

27 Log Event Tracing Each Charm++ event and any registered user events are recorded into a pre-assigned memory buffer on each processor. Default Buffer size is 10,000 trace log entries. When a buffer is full, a special flush event is logged and buffer is flushed to disk. Flushing is done independent of other processors. There is no synchronization on a flush.

28 Summary Tracing Invoked by the same event set as in Log tracing. User events are, however, ignored. Memory buffer is organized into k bins of representing an initial time of 1ms. Each event contributes data into the appropriate time-bin. When a buffer is filled, bin-time representation is doubled and the data is packed into the first k/2 bins. Event contribution continues at the (k/2+1)th bin.

29 Memory Footprint Control Tracing Memory Footprint Event Log tracemode (-tracemode projections) Default 10,000 event entries. Controlled by runtime flag +logsize <size>. Summary tracemode (-tracemode summary) Default 10,000 time bins of 1 ms. Controlled by runtime flags +bincount <#bins> and +binsize <seconds>.

30 Memory Footprint Control (2) Trade-offs to consider Log Buffers Flush overhead vs Memory usage Frequency of flushes vs Size of flushes Summary Buffers Frequency of compaction + Data Granularity vs Memory usage

31 Controlling Data Volume [Code-time] User API for partial tracing. [Link-time] Generating only summary data. [Run-time] Writing compressed output (runtime flag): +gz-trace [Post-run] Deleting subset of generated logs. [Visualization] Parameter range control, analysis Memory.

32 Visualization Parameter Control

33 Specific Visualization Tool Issues Memory usage constraints Timeline - dependent on the event-density of the selected time range of the application. Typical workable range is 10ms to 10s for between processors. Processor based tools (Overview, Communications, Usage Profile) - limit to 2000 processors or less. Interval based tools (Graph, Time Profile, Communication vs Time, Animation) - limit to 1500 time intervals. Histogram tools limit to 1000 bins or less. 10/19/05 Projections Tutorial 33

34 Analysis Techniques Zoom in from wide-ranged low-detail views to detailed look at problem spots. Make use of range histories. Control data volume. Use effective colors. Make effective use of specific tool features (see manual).

35 Analysis Techniques (2) Load Imbalance: Overview, Usage Profile. Where's my work going?: Time Profile. How is my communication behaving?: Communication, Communication vs Time. Are there critical paths? I need details!: Timeline.

36 Putting it all together

37 Outline: Extremely Large Datasets General Introduction to Projections Projections Basics How Tracing Works Features to aid Effective Analysis Extremely Large Datasets Tips and Notes

38 Considerations for Extremely Large Datasets Large number of files ware thee the filesystem. Huge amounts of data control!

39 Demo NAMD on 8192 processors

40 Outline: Miscellany General Introduction to Projections Projections Basics How Tracing Works Features to aid Effective Analysis Extremely Large Datasets Tips and Notes

41 Conveniently Placing Logs Specifying a user-defined output location (runtime option): +traceroot <desired log directory> It is important to note that <desired log directory> must be available on a machine's compute nodes. 10/19/05 Projections Tutorial 41

42 Perturbation Issues Perturbation of application. Tracing overhead Timer overheads (rdtsc, machine wallclock). Acquisition of performance data and storage. Observed in the case of NAMD with timesteps below 10ms with many compute objects in the microsecond range. 10/19/05 Projections Tutorial 42

43 Look Closely! Do not be fooled! Projections does not handle some visualization artifacts well: Fine grain details can sometimes look like one big solid block on timeline. It is hard to mouse-over items that represent fine-grained events. Other times, tiny slivers of activity become too small to be drawn.

44 Answering my questions (again!) Load Imbalance: Overview, Usage Profile. Where's my work going?: Time Profile. How is my communication behaving?: Communication, Communication vs Time. Are there critical paths? I need details!: Timeline.

45 Frequently Asked Questions Q: I tried user events and projections visualization crashes! A: Did you register the events before using them? Q: There are giant stretched event(s) in my run! A: Did you set a large enough log buffer size? Q: Projections visualization crashes! A: Are your logs corrupted? These can happen on bad I/O (experienced on NFS).

Using BigSim to Estimate Application Performance

October 19, 2010 Using BigSim to Estimate Application Performance Ryan Mokos Parallel Programming Laboratory University of Illinois at Urbana-Champaign Outline Overview BigSim Emulator BigSim Simulator