The Use of Process Clustering in Distributed-System Event Displays

The Use of Process Clustering in Distributed-System Event Displays David J. Taylor Abstract When debugging a distributed application, a display showing the events causing interactions between processes can be very useful. If the number of processes is large, displaying all of them may be impossible or undesirable. In such cases, several processes may be collapsed into a cluster, with all interactions internal to the cluster omitted from the display. This paper describes the fundamental theoretical constraints on such clustering and the means for effectively displaying clusters. It also describes the particular implementation of clustering provided in a prototype debugger that allows a hierarchical cluster structure to be built and conveniently manipulated. 1. Introduction Debugging a distributed or parallel application involves all of the problems encountered in debugging a sequential program, plus problems specific to the distributed/parallel environment. A useful facility in debugging such applications is a display showing the interactions between processes. A previous paper [9] described the value of such displays in detail and their implementation in a prototype debugger for the Hermes language [2]. That paper also provided a brief description of process clustering as a means for removing currently unwanted detail from a display and the techniques adopted for effective display of clusters. Clustering as a means for eliminating unwanted detail, while preserving the underlying partial-order relationship between The IBM contact for this paper is Patrick Finnigan, Client Server Enabling Tools, Application Development Technology Centre, IBM Canada Ltd., Mail Stop 3P, 1150 Eglinton Avenue East, North York, Ontario M3C 1W3. events, does not appear to be used in the distributed-debugging work being performed by other groups. Thus, many issues related to it remain to be explored. This paper reviews some of the material from that previous paper, but concentrates on two other aspects of clustering: the theoretical constraints underlying the display of process clusters and the provision of a user interface that makes it easy for a user to work with a large and complex collection of processes and clusters. The remainder of the paper is organized as follows. Section 2 describes the theoretical foundations for clustering, including two interpretations of what should be considered a legitimate display of events from a cluster. Section 3 describes an interface that has been developed to make the manipulation of clusters simple and intuitive for a user. This section includes a description of the notion of a debugging focus within a hierarchical structure of clusters. Finally, Section 4 presents some conclusions and suggestions for further work, including extension of the process-clustering concept to events. The prototype software which is described in Section 3 was originally developed for the Hermes environment. Although it has now been retargeted to several other environments (Concert/C [3], OSF/DCE [6], SR [1], and the µsystem [4]), the description in the paper is oriented to the Hermes environment. That environment was used for initial development of the hierarchical-clustering facilities because the large number of processes in a Hermes application presents significant opportunities for clustering. In spite of this orientation, very little in the paper or the prototype is specifically dependent on Hermes.

2. Clustering theory Before describing the theoretical aspects of process clustering, a brief description of event displays in general is required. An event display consists of a number of vertical trace lines, each representing the activity of a process (or thread) or the activity within a cluster. (For target environments other than Hermes, a trace line could also represent something else. For example, in the µsystem, monitors are represented by trace lines, so that a monitor entry by a thread is shown as an interaction between two trace lines.) Events are represented by symbols on the trace lines. In the prototype, a variety of symbols is used to help distinguish types of events (for example, call versus return), but for the artificial examples in this paper, all events are simply represented as open circles. If two processes interact, then a line is drawn connecting the appropriate pair of events. If the interaction is synchronous, the connecting line is horizontal. If the interaction is asynchronous, the connecting line slopes down from the initiating event to the terminating event. One of the most important things that such an event display provides for the user is an indication of the partial-order relationship between events. Given the ordering within processes and the interactions between processes, the precedence relationship between any pair of events can be determined. This partial-order relationship is also fundamental to the construction of event displays. A basic constraint is that an event must always be displayed at a higher position than any successor event. When clustering is performed, it is therefore critical that the partial-order relationship not be distorted. The essential idea of clustering processes in a display is simple. A set of processes is identified for which activities internal to the set are currently irrelevant. That set of processes is replaced in the display by a cluster, which ideally has (almost) the same visual appearance as an individual process and the only events inside the cluster that continue to be displayed are those that interact with processes outside the cluster. Figure 1 provides a trivial example of reducing a display containing four processes to a display containing one process and one cluster. The unclustered display on the left of the figure shows P1 calling P2, which in turn calls P3, and P3 calls P4 twice before returning to P2. The clustered display on the right of the figure shows P1 calling some process in C1 and then some process in C1 returning to P1. If P2 represents a server process intended for use by user processes and P3 and P4 represent subordinate processes not intended for use by user processes, then the cluster C1 also represents the view an ordinary user is likely to want, since it shows a server being invoked and then returning, without showing the internal activity of that server. P1 P2 P3 P4 P1 C1 Unclustered Clustered Figure 1. A simple example of clustering In this simple example, it is clear that the clustered representation is a legitimate summary of the underlying set of events. In more complex situations, however, it is not obvious whether the clustered display should be considered legitimate. At least two different criteria have been proposed, both attempting to make concrete the vague constraint that the clustered display should not mislead the user. In some sense, the difference between the two criteria arises from a difference in viewpoint. The first criterion is based on the notion that the two events displayed on C1 are not two of the events on P2 in the right half of the diagram, but are rather new events that are effectively on the dashed line drawn between P1 and P2. The second criterion is based on the notion that the two events displayed on C1 are simply two of the events on P2. The first criterion was proposed by Henry Cheung in his Ph.D. thesis [5]. Underlying the criterion is the notion that each cluster has a set of

interface points through which it communicates with processes and other clusters. Figure 2 shows an example of a cluster that contains four processes and has two interface points: one (I1) used by P1 and P3 and the other (I2) used by P2. In a representation such as that in Figure 1, a cluster is then represented by as many vertical lines as it has interface points. (In Figure 1, C1 is assumed to have only one interface point.) At each of these interface points, we would like to have sequential behaviour, so that a single linear representation is not misleading. Q1 I1 P1 P3 P2 P4 Figure 2. A cluster with two interface points I2 Q2 Q3 The precise constraint used is that for each interface point it must be possible to obtain a consistent interface cut. As mentioned above, in this case it is assumed that the events displayed for the cluster are new events on the cut line rather than selected events from the processes being hidden inside the cluster. An interface cut is a total order on those events and is consistent if it does not introduce any new precedence relationships between existing events. Figure 3 (which is adapted from [5]) shows a simple example of consistent and inconsistent interface cuts for an interface point. In the figure, processes P2 and P3 are assumed to form a cluster, communicating with P1 through a single interface point. The interface cuts C1 and C2 are possible representations of the behaviour of the cluster. For such a simple example, in particular because there are only two interface events, it is easy to verify the consistency or inconsistency informally. In the first case, the precedence relation between x 1 and y 1 can only be used in one way: b precedes x 1 precedes y 1 precedes d. However, b precedes d was already implied, P1 C1 P2 P3 c d x 1 y 1 a Consistent b P1 C2 P2 P3 c d x 2 y 2 a Inconsistent Figure 3. Consistent and inconsistent interface cuts going through event c, so no new precedences are created. In the second case, the precedence relation between x 2 and y 2 allows us to deduce that a precedes c and this precedence relationship did not exist prior to the introduction of the interface cut. Thus, C1 is a consistent interface cut and C2 is not. Henry Cheung s work provides mechanisms for testing whether an interface cut is consistent. It provides mechanisms for generating a consistent interface cut only in certain special circumstances, such as the case in which, on one side of the interface, all communication across the interface involves only a single process. It appears that, in the general case, generating a consistent interface cut is not feasible because it could require time exponential in the number of interface events. Thus, there is a problem with the first criterion in trying to construct an appropriate interface trace from a given set of data. There may also be a more fundamental problem. The basic constraint is that the display should not mislead the user. Of course, this is too vaguely stated to be used as a formal constraint, but one can ask what a typical user is likely to deduce from a display containing P1 and C1. A user is likely to view x 1 and y 1 as being, essentially, events b and a, rather than as new events only loosely related to b and a. If the user makes this identification, then the user will also deduce that b precedes a, which is false. For the two reasons just mentioned, an alternative criterion was sought when clustering was added to the prototype debugger. This criterion is based on the notion that the displayed events are, effectively, events from the various processes, some of which are displayed as part of their own process and others of which are b

displayed as part of a cluster into which their process has been collapsed. In this case, the constraint is simply that for any pair of adjacent events displayed for a cluster, the upper event must be a predecessor of the lower event, in the original partial order. For this criterion, neither C1 nor C2, in Figure 3, is acceptable. Because a and b are concurrent, they cannot be displayed on a single trace line. This latter criterion was used in the prototype software, both because its implementation is straightforward and because it appears less likely to mislead the user. The possibility that displays created according to the first criterion will be misleading is probably increased by the debugger feature that allows a user to obtain more information about a displayed event. For obvious reasons, this information includes the identity of the process in which the event occurred and thus is likely to cause the user to identify the displayed event with the original event rather than imagining it to be a new interface event. There is still the problem of what action to take if concurrency exists among the events to be displayed for a cluster. As described previously [9], the approach adopted is to dynamically create enough trace lines to allow all mutually concurrent events of a cluster to be displayed on separate trace lines. Sections of these trace lines are then joined to each other by linking arrows to provide a visual indication of the precedence relationship between the events not displayed on the same line. The fundamental difference between this display and a display based on interface points is that there is no permanent association between trace lines and pairs of communicating processes. In a display based on interface points, events representing communication between Pi and Pj will always be shown on the trace line representing the relevant interface point. The approach used in the prototype may cause such events to be placed on two or more trace lines, although a heuristic attempts to avoid arbitrary wandering from one trace line to another. The approach adopted allows any set of processes to be clusterd, although in the worst case the displayed cluster may have as many trace lines as there are processes in the cluster. An approach based on interface points and consistent interface cuts also allows any set of processes to be clustered, but the set of interface points must be chosen correctly. If a user selected an inappropriate set of interface points, no display could be drawn. An advantage of the approach not using interface points is that any user specification of clustering is acceptable. The user cannot create a situation in which no legitimate display can be drawn. 3. Establishing and manipulating clusters When clustering was added to the debugger prototype, the initial objective was to provide mechanisms for efficiently generating displays containing clusters. In particular, the issues addressed were primarily those that are direct consequences of the theoretical discussion in the preceding section. To obtain a useful tool, it is also necessary to have convenient means for specifying what processes should be placed in a given cluster. It is also desirable to have a hierarchical structure, in which a cluster may contain other clusters as well as processes. Before describing the facilities eventually adopted, a brief description will be given of two earlier interfaces and their perceived inadequacies. The first interface developed did not allow hierarchical clustering. The operations provided allowed the creation of a cluster containing a specified set of processes, the addition of processes to an existing cluster, the merging of two existing clusters into a single cluster, and the destruction of a cluster. Combined with facilities for writing the current cluster specification into a file and reading a cluster specification from a file, these provided an interface that was usable but quite inconvenient. In particular, the lack of hierarchical clustering, the inability to determine explicitly the set of processes in a cluster, and the lack of a facility for removing processes from a cluster were all significant difficulties. The second interface developed was part of a project to build clusters automatically, based both on the events occurring at execution and a static analysis of Hermes source code [7]. Besides the automatic creation of a set of clusters, the key added feature was hierarchical clustering. In addition to operations similar to those described above, one could alter the display by moving up and down the hierarchy. A cluster could be

opened, causing the cluster to be replaced on the display by its component processes and clusters, and a cluster could be closed, reversing the effect of open. Even ignoring the automatic creation of clusters, this facility was considerably more powerful, but it suffered from the serious problem that a complex hierarchical structure existed that could only be explored incrementally, by opening and closing clusters. Given this experience, it became clear that an interface based on a graphical display of the cluster structure was needed to provide a convenient means for the user to understand and manipulate process clusters. The cluster structure cannot reasonably be displayed as part of the normal event display, so an additional window is used. In this second window, the cluster structure is shown using the obvious tree representation and the user can manipulate the cluster structure by manipulating the tree. More specifically, the clustering window displays a tree whose leaves are processes and whose internal nodes are clusters. The processes and clusters directly contained in a cluster are shown as the children of that cluster. The root is a cluster containing all the processes. Each node is represented by a rectangle containing the name of the process or cluster. Initially, the tree simply consists of a root with all processes being children of the root. A complicated structure can be created in a single step by using the automatic-clustering facility or by reading in a previously saved structure. Such structures can then be modified if desired or a cluster structure can be built directly by performing manipulations on the initial, trivial structure. All these manipulations are carried out by selecting nodes of the tree, rather than processes in the event-display window. However, if the appropriate option is enabled, changes in the tree structure cause the event display to be redrawn to reflect the changed cluster structure. If the debugger is being used with a running application, rather than post-mortem, and a new process is created, the new process will appear simultaneously in the main display and the cluster-hierarchy display. Processes created after a cluster hierarchy is built are inserted as children of the root. The operations to modify the cluster hierarchy were intentionally kept simple. There are essentially only three operations: create a cluster, delete a cluster, and move processes/clusters into a cluster. To create a cluster, a name must be supplied and an existing cluster must be identified as the parent of the new cluster. To delete a cluster, it is only necessary to identify the cluster. To move processes or clusters (or both) into a cluster, it is necessary to identify the cluster and then the entities to be moved into it. What the user might imagine as creating a cluster thus involves two steps. First, an empty cluster is created, at the appropriate place in the hierarchy. Then, the desired items are moved into the cluster in a second step. This may require slightly more work, but it is simple and the procedure for adding items to a cluster is exactly the same as the procedure for placing the initial items in a cluster. The only restriction on moving nodes in the hierarchy is that a node may not be moved to become a child of itself or one of its own descendants, since that would disconnect the tree. Clusters may be deleted when they are nonempty. Any children of a deleted cluster become children of that cluster s parent. Other than operations that replace the complete cluster hierarchy, there is only one additional operation for modifying the hierarchy. As a result of other changes, useless clusters may occur in the tree. A cluster is useless if it has no children or exactly one child. Such clusters can be deleted individually, but, for convenience, an operation is provided that deletes all such clusters. Given a hierarchical cluster structure, it is necessary to determine a set of clusters and processes from the hierarchy that are to be used in the current event display. The original interface simply allowed all processes to be displayed or all clusters plus all processes not in clusters. Even for a single-level cluster structure, this is not sufficiently flexible. It is clearly inappropriate for a hierarchical structure. The concept of a debugging focus captures the idea needed here. A debugging focus is the set of processes and clusters the user is currently interested in and, hence, the set that should be displayed. Formally, a debugging focus is a cut across the cluster hierarchy, that is, a set of

processes and clusters such that for each leaf in the tree, there is exactly one element of the set on the path from the root to that leaf. Although a singleton set containing only the root is formally a legitimate debugging focus, it is excluded from practical consideration since the resulting event display would consist of a single trace line and no events. Figure 4 shows an example of a simple cluster hierarchy with a debugging focus. The nodes drawn with a double circle (3, 6, 7, 8, 9, 10, 11) are in the current debugging focus. 1 2 3 4 5 6 7 8 9 10 11 12 13 Figure 4. Example of cluster hierarchy and debugging focus As with the manipulations of the hierarchy, simplicity was a major consideration in designing a method for the user to select a debugging focus. The interface simply allows the user to specify that a selected node should be placed in the debugging focus. Then, a minimal set of changes is made to create a debugging focus containing the desired node. If descendants of the selected node are currently in the debugging focus, then all descendants are removed from the focus as the selected node is added. If an ancestor of the selected node is currently in the debugging focus, the actions are more complex to describe, although the intuition is simple. The algorithm is best described as an iteration that eventually brings the selected node into the focus. First, the ancestor currently in focus is removed from the focus and all of its children are placed in the focus. If one of those children is the desired node, the algorithm terminates. Otherwise, the steps are repeated, beginning with the modified focus, until eventually the desired node is placed in the focus. The required number of iterations is simply the length of the path between the ancestor currently in focus and the selected node. Since the root node is not allowed to be in the focus, if it is selected all its children are placed in the focus. This is equivalent to selecting the root node and placing it in the focus, then immediately selecting one of its children to be placed in the focus. If the current cluster hierarchy and debugging focus are as shown in Figure 4 and the user selects node 2 to be placed in the debugging focus, the result will be as shown in Figure 5. Node 2 has been placed in the debugging focus, requiring the removal of nodes 6, 7, 10, and 11. If the user then selects node 12 to be placed in the debugging focus, the result will be as shown in Figure 6. Node 2 has been removed from the debugging focus and nodes 5, 6, and 13 have been added, as well as the requested node 12. As explained above, this was accomplished in two steps. First, node 2 was removed from the focus and replaced by nodes 5, 6, and 7. Then, node 7 was removed from the focus and replaced by nodes 12 and 13. 1 2 3 4 5 6 7 8 9 10 11 12 13 Figure 5. Debugging focus after selecting node 2 These focus-changing facilities are intended to be more convenient than operations like open a cluster and close a cluster. In particular, the transition from Figure 5 to Figure 6 would require two open a cluster operations rather than a single selection. In deeper trees, the number of open a cluster operations equivalent to a single focus-node selection could be much larger. Preliminary experience within our research group indicates that the facilities are easy to use, but experience with a broader community of users is needed to determine whether the interface is suitable for general use.

1 2 3 4 5 6 7 8 9 10 11 12 13 Figure 6. Debugging focus after selecting node 12 One danger with the facility as implemented is that selecting a node near the root of the tree can wipe out an intricate set of focus selections. If the selection was accidental, the user might need to perform signficant work in order to restore the previous focus. In Figure 4, selecting the root node would wipe out all existing focus information, replacing the focus with nodes 2, 3, and 4. The tree is so small that it is not very hard to restore the previous focus (just select nodes 8 and 10, for example), but selecting the root of a much larger tree might have serious consequences. If this becomes a problem in practice, presumably an undo facility would be the most appropriate solution, rather than adding restrictions on the way the user can change the focus. 4. Conclusions and further work The clustering facilities described in this paper have been implemented in our debugger prototype. Although the facilities have not yet been used extensively they appear to be useful and convenient. In particular, they are clearly an improvement over the two previous interfaces provided for clustering. Additional improvements are clearly needed. If many processes exist, it is difficult for the user to locate the tree node in one window corresponding to a trace line in the other window, and vice versa. This problem is exacerbated by the Hermes phenomenon that many processes exist with identical names. At present, the only certain way to establish the correspondence for such processes is to try hiding one in a cluster, observing which trace line disappears from the event display. A possible solution is to allow selections for cluster-hierarchy modification and focus changing to be made in the trace window as well as the cluster window, but careful design will be needed to avoid a complicated and confusing interface. A more minor problem is that the user presently has complete control over the order of trace lines in the event display, but no direct control over the order of nodes in the cluster tree. A simple facility to rearrange the presentation of the tree should be provided. Unfortunately, the obvious possibility of making the tree follow the order of traces in the event display is not feasible. All the children of a cluster must be adjacent in the tree, but no such constraint exists in the event display and it would be unreasonable to add such a constraint. At present, there is also a software-engineering problem with the clustering implementation. The debugger is split into three processes: the debugsession process, the checkpoint process, and the disk-server process. As discussed previously [9], the debug-session and checkpoint processes are intended to be independent of the target environment, with all information specific to the target system embedded in the disk-server process. Some Hermes-specific information about clustering currently resides in the debugsession process and needs to be moved to the disk-server process to complete the implementation to our normal standards for the prototype. In spite of the above difficulties, the prototype is currently quite useful. Its only significant use has been in the Hermes environment, but once the last difficulty described above has been rectified, we intend to experiment with it in the other available target environments. We expect that it will be useful in those other environments, but it is likely that experience will indicate the need for additional features or modification of existing features. A longer-range activity is the extension of clustering from processes to events. In the same way that several processes can be grouped to form a cluster, several events could be grouped to form an abstract event. A fundamental difficulty is that whereas any set of processes can be clustered without causing problems, although poor choices

may lead to displays that aren t very useful, an arbitrary set of events cannot be allowed as an abstract event. Thus, there are initial problems in determining appropriate restrictions on the sets of events that can be allowed for abstraction. Many other problems also exist, such as efficiently determining the precedence relation between an abstract event and a simple event or between two abstract events. In addition, a meaningful display of abstract events is much harder to design than a display involving clustered processes. Some preliminary theoretical work on event abstraction has been completed [5, 8], but more work is clearly required before event abstraction can be added to the existing prototype software. Acknowledgments The initial implementation of hierarchical clustering was performed by Thomas Kunz, a Ph.D. student visiting the Shoshin project. The graphical cluster interface was implemented and, to a significant extent, designed by Paulo Ferreira during two co-op work terms with the Shoshin project. The work described in this paper was supported by the Natural Sciences and Engineering Research Council of Canada under grant OGP0003078 and a CRD grant, and by the Information Technology Research Centre. About the author David Taylor is an Associate Professor of Computer Science at the University of Waterloo, where he has been a faculty member since 1977. His research interests include distributed-systems software and software fault tolerance. During the 1991-1992 academic year, he spent a sabbatical at the Centre for Advanced Studies, IBM Canada Ltd. Laboratory. He can be reached at the address Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1. His e-mail address is dtaylor@boomer.uwaterloo.ca. References 1. G. R. Andrews, et al, An overview of the SR language and implementation, ACM Transactions on Programming Languages and Systems, 10(1) pp. 51-86 (January 1988). 2. R. E. Strom, et al, Hermes: A Language for Distributed Computing, Prentice-Hall, Englewood Cliffs, New Jersey (1991). 3. S. Yemini, et al, CONCERT: A high-levellanguage approach to heterogeneous distributed systems, Proceedings of the 9th International Conference on Distributed Computing Systems, pp. 162-171 (June 5-9, 1989). 4. P. A. Buhr and R. A. Stroobosscher, The µsystem: Providing light-weight concurrency on shared-memory multiprocessor computers running UNIX, Software Practice and Experience, 20(9) pp. 929-963 (September 1990). 5. H. W. H. Cheung, Process and Event Abstraction for Debugging Distributed Programs, Ph.D. Thesis, University of Waterloo, Ontario, Canada (1989). Also available as CCNG Technical Report T-189. 6. Open Software Foundation, Introduction to OSF/DCE, Prentice-Hall, Englewood Cliffs, New Jersey (1993). 7. T. Kunz and D. J. Taylor, Distributed debugging using a reverse-engineering tool, Proceedings of the 3rd Reverse Engineering Forum, (September 15-17, 1992). 8. J. Summers, Precedence-Preserving Abstraction for Distributed Debugging, M.Math. Thesis, University of Waterloo, Ontario, Canada (1992). 9. D. J. Taylor, A prototype debugger for Hermes, Proceedings of the 1992 CAS Conference, Volume I, pp. 29-42 (November 9-12, 1992).