Steering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream

Agent Roles in Snapshot Assembly Delbert Hart Dept. of Computer Science Washington University in St. Louis St. Louis, MO 63130 hart@cs.wustl.edu Eileen Kraemer Dept. of Computer Science University of Georgia Athens, GA 30602 eileen@cs.uga.edu Abstract The ability to understand running distributed computations depends on eective monitoring techniques. Monitoring distributed systems entails two primary tasks: collecting data from the application processes and integrating it into comprehensive global views. This paper focuses on the snapshot assembly task of taking process checkpoints and forming global snapshots. The assembly task can be performed in many ways, each having its own set of advantages and disadvantages. We look at some of the dierent approaches and their associated costs and benets. Then the roles that agents can play in the assembly process are examined in the context of the PathFinder visualization system. Keywords: distributed monitoring, consistent snapshots, agents 1 Introduction Monitoring is an essential function in tools for understanding distributed computations. Debuggers, interactive steering systems, and visualization tools all rely on some form of monitoring to provide the information upon which to base their representation of the execution of an application. The extent to which users can rely on these representations to be accurate depends on the guarantees made by the underlying monitoring system. However, providing such guarantees is non-trivial in distributed systems. The monitoring of a distributed computation may be viewed as the creation of a sequence of global snapshots. Each global snapshot is a set of checkpoints, local snapshots representing the state of a single process, with one checkpoint from each process in the computation. A snapshot should be consistent, representing a possible state of the computation from when the data was collected. The lack of a global clock and uncertainty in message delivery times complicate the task of assembling global snapshots from the streams of local snapshots produced at each process. Dierent degrees of consistency exist, and the particular type of consistency that is sought aects both the ordering information that must be collected and the criteria to be applied in the assembly of global snapshots from checkpoints. Although important, consistency is not the only criteria by which a monitoring system is judged. Consistency concerns must be balanced against consideration of the lag in presentation, the scalability of the system, and the perturbation induced by monitoring. Latency or lag refers to the elapsed time between the existence of a state in the program's execution and the presentation of that state to the viewer. The scalability of the monitoring system is a measure of how the performance of the monitoring software changes as a function of the number of processes in the computation, the amount of data, and the frequency of data collection. Perturbation refers to the degree to which the underlying computation is slowed down or otherwise aected by external forces, i.e., the monitoring software. In this paper we examine some of the different ways checkpoints can be assembled into

snapshots. In addition, we show how agents can be used to support the assembly task, in the context of the PathFinder[1] exploratory visualization system. An obvious role that agents can take is to instantiate general assembly algorithms. Clearly, these agent-based assembly protocols will be slower than standard compiled protocols. However, the use of agents allows us to examine a variety of algorithms, \tweak" their parameters interactively, and compare the trade-os of dierent approaches before committing to further development. In addition, the use of agents to implement assembly algorithms permits the user to easily switch assembly algorithms at runtime. Instead of simply following general assembly algorithms, agents can be designed to take advantage of application specic information to make the assembly process more ecient. Agents can also be used to support nonagent assembly solutions. The remainder of the paper is organized as follows: Section 2 describes the PathFinder monitoring system. Section 3 looks at dierent types of snapshot assembly. The roles agents can play is considered in Section 4. Finally, the paper is summarized in Section 5. 2 PathFinder The purpose of the PathFinder system is to support exploratory visualization[2]. Exploratory visualization is rooted in the realization that it is not feasible to collect and present all of the data in large, long-lived distributed computations nor is it typically desirable to do so. Rather, a user exploring the execution of a distributed computation through visualization and interaction needs only a subset of the data available. In the interest of good performance and clarity, only this subset should be collected and presented. As the user explores the computation, the particular subset of data that is \interesting" evolves, thus the user is provided with the ability to navigate through the computation, changing what is collected and how it is presented. Our approach to ex- Interaction Managers Stream Manager Steering Stream Snapshot Stream User Interface Figure 1: PathFinder architecture overview. ploratory visualization is based on viewing the interaction between the user and the computation in terms of streams of information. The user is presented with a stream of globally consistent snapshots representing the computation and can send a stream of steering commands. The PathFinder architecture consists of Interaction Managers (IMs), a Stream Manager (SM), and a User Interface (UI), as shown in Figure 1. The IMs collect data from and allow steering of values in the application processes. The UI presents the collected data to the user and receives user commands to change how the data is collected, the way in which the data is presented, or the computation (by steering its variables). The SM serves as an intermediary between the UI and IMs, ensuring that the information passed from one side of the system to the other is properly correlated, e.g. collating data into snapshots and distributing steering commands to the appropriate IMs.. PathFinder's architecture uses an attributeevent model of the computation. Processes possess attributes that are available for monitoring and/or steering. The Interaction Manager serves as a framework for accessing an application process's attributes and learning of its events. It is implemented as a library of routines installed at the process and provides an interface between the application and the monitoring system. Each IM contains the database of locally available attributes. Events from the application are received by an IM when speci- ed conditions exist, e.g. the execution passes through a particular point in the code. In the current implementation, software annotations indicate the occurrence of events, the availability of process variables for monitoring, steering, or both.

PathFinder is modular, the functionality of the monitoring and steering system is separated into layers. A layer consists of a module installed at the SM and companion modules installed at the IMs that work together to perform a specic function. Layers provide services such as the collection of ordering information for snapshot construction, monitoring, steering, migration, and rollback. This separation of functionality into loosely coupled modules permits PathFinder to be congured in a \plug-and-play" fashion. The set of layers installed determines both the capabilities of the system and the costs in terms of consistency, perturbation, lag, and scalability. One layer that is available for use is an agent layer that was designed to provide monitoring and steering functionality. A layer that provides information for creating and ordering snapshots is referred to as an assembly layer. 3 Snapshot Assembly To monitor an application a tool can generate a sequence of global snapshots of the application. The assembly task is to maintain guarantees about the accuracy of the individual snapshots and their sequencing. Performing the assembly task eciently can be challenging in distributed computations. A distributed computation is a set of processes cooperating to perform a task or service. This suggests that it would be useful to view the state of the distributed system as a unied whole, a single set of attributes. The distributed nature of the computation results in the attributes being partitioned, by the process they reside in, into a set of checkpoints. Hence, a global snapshot is a set of checkpoints, such that there is exactly one checkpoint from each process. In general, processes do not take checkpoints at the same instant. Consequently, any monitoring system for distributed programs must make decisions about how the attributes of the processes should be aggregated for presentation to the tool it serves. Clearly, attributes in the same checkpoint should be presented together. It is less clear how to correlate attributes from separate checkpoints. There is no global clock available for the processes to reference, yet some consistency criteria must be used to decide how to assemble a set of local snapshots into a global snapshot. An established consistency criteria for global snapshots is the causality relation. The causality relation is a partial ordering of the events of a computation, given by Lamport's happenedbefore relation[3]. Two events are concurrent if they are not orderable by the causality relation. Similarly, checkpoints can be ordered by associating a checkpoint with the event that immediately preceded it. The choice of a method for aggregation affects the way in which the user of the tool views the computation. Inconsistent aggregations can mislead the viewer, and even logically consistent aggregations can obscure ordering information or fail to emphasize interesting aspects of the computation[4]. Obtaining consistency is not automatic or free though. The cost of obtaining a consistent view needs to be weighed against other monitoring and steering considerations. To illustrate the dierent ways in which assembly can be performed, four general categories of assembly algorithms and the consistency guarantees they provide are presented. Physical assembly is based on hardware clocks. A global snapshot is constructed by choosing the latest checkpoint from each process that is before a chosen time. Although physical assembly is adequate for many applications, problems can arise. If the physical clocks are not tightly synchronized and the elapsed time between local snapshots is small, then global snapshots may be created that violate causality, e.g., a receive appears in a global snapshot before the corresponding send has been presented. Such inconsistencies in global snapshots can mislead a viewer or cause errors in analysis tools. To prevent possible causality violations a causal assembly algorithm can be used. Several reasonable approaches exist for achieving causal assembly. A straightforward way is by

keeping logical clocks, also known as Lamport clocks[3]. A snapshot is constructed as it was in physical assembly, the only dierence is that logical clocks are used instead of physical ones. Causal assembly ensures that the global snapshots created reect states of the computation that were possible. The method of implementing causal assembly aects how much exibility exists in choosing global states for presentation. Limited causal assembly refers to a technique, such as logical clocks, in which some sequences of global snapshots that are correct are not possible to obtain. A technique that provides the ability to reach all possible sequences is called full causal assembly. One way of providing full causal assembly is to use vector clocks to track the causality relation among the processes. Strong consistency[5] refers to snapshots that are consistent and do not have any messages in transit. PathFinder uses transactional assembly to create global snapshots that are strongly consistent. Transactional assembly views the computation as consisting of a set of (possibly nested) logical actions. The user observes these logical actions as occurring atomically, at whatever level of granularity is appropriate. We refer to the logical actions as transactions. In PathFinder, transactions are recognized independently at each local process through code annotations indicating the beginning and end of the process's participation in the transaction. Each process also records information about the other processes it has communicated with during the transaction. Local portions of the transactions are assembled into full multi-process transactions through transitive analysis of the communication events to determine the members of the transaction. Global snapshots are constructed based on the transaction membership and ordering information. We have developed several algorithms that can be used to recognize and order transactions[6]. Some of these algorithms have been encoded into assembly layers for PathFinder. The assembly of checkpoints into global snapshots is an essential task of any monitoring system. The choice of how it is done and what guarantees are made depends on the application, the task that the user wishes to perform, and performance characteristics of the environment in which the monitoring is performed. Physical assembly has low perturbation and lag, but can have consistency problems. Limited causal assembly ensures that the snapshots created are ones that could have happened, but results in additional perturbation. Full causal assembly allows a choice of any possible consistent state for presentation, but at the expense of maintaining a vector clock or other interprocess dependence information. In general, full causal assembly does not have any means of scaling to very large computations, as the size of the vector clock must be equal to the number of processes[7]. Transactional assembly requires more perturbation still and can cause some additional lag in the presentation, but it provides additional consistency guarantees. It also has a better ability to scale than full causal assembly because of the user's ability to choose the level of temporal detail at which to view the computation. 4 Agent Roles As a distributed computation runs, the best assembly strategy may change. One way to cope with this changing environment is to utilize agents in the assembly process. For this paper, it suces to consider an agent as an encoding of code and data. More comprehensive views on agents can be found in[8, 9]. For an agent to be eective there must be an environment for it to operate within. In PathFinder, agents operate within an environment known as a milieu. The milieu provides basic services to agents allowing them to execute, interact with, and create agents locally, and to migrate to other milieux. As seen in Figure 2, the milieu's interface to the outside world is through two queues, an incoming queue and an outgoing queue. The queues contain agents that are either arriving

Attributes Incoming Queue Outgoing Queue Events Interprocess Communication Milieu Agent Module Figure 2: Attributes, events, and interprocess communication from the IM are available to the milieu through the agent module. at or departing from the milieu. An agent interacts only with the milieu and with other local agents. The application process is represented through agents created and/or simulated by the agent module. When an event occurs in the application, the agent module creates an agent to represent that event (an event agent) and adds it to the milieu's incoming queue. The application's attributes are accessed as though they were data elements of these event agents. The agent module in PathFinder currently generates three types of event agents: transaction, message send, and message receive. As described in the previous section, the snapshot assembly task is to order the checkpoints generated by the distributed processes and integrate them into global snapshots. In the IMs this task is handled by the assembly layer. Through the agent module and milieu, agents have access to the same attributes and events that an assembly layer does. This enables the agents to implement general assembly algorithms. As an example (Figure 3), consider how agents could implement the selective transaction assembly algorithm[6]. The selective algorithm determines the members of the transaction at the application processes and then it forwards the membership information to the SM, which then orders the received checkpoints. Agents are not limited to implementing (a) (c) Figure 3: Selective algorithm implemented using agents: (a) agent observes who the process communicates with during the transaction through arriving event agents (b) the agent waits for information about other processes in the transaction to be delivered to it (c) the agent then migrates to another milieu (d) where it delivers its accumulated information to another agent in the transaction. (b) (d)

standard assembly algorithms. In fact, one of their strengths is their ability to make use of application specic information to order checkpoints more eciently. For example, consider a distributed application that primarily consists of a loop and contains a counter of the number of times it has passed through the loop. Agents could read the attribute and use it as a logical clock to order checkpoints. The perturbation, scaling, lag and consistency in such a case is excellent since the application is already doing the work necessary to decide on an ordering. Having an attribute that can be used as a logical clock is a very simple case. In addition to published attributes, agents can utilize data from a wide variety of sources such as message passing events, the history of the process, monitoring specic data, and information received from outside the process. The ordering of snapshots can also be encapsulated by agents. Figure 4 illustrates how an agent could generate logical clock values to produce a particular ordering for the user. Computing these custom assembly tasks could also be done at the SM. The primary advantage of using agents to compute them is that it reduces the load on the SM. When monitoring large computations, the SM can easily become a bottleneck since all of the information that the user will see passes through it. Using agents reduces the amount of information that needs to be sent to the SM and it also reduces the amount of computation done at the SM. In general, agents serve as a convenient means of extracting and analyzing the ordering information from the application. Agents can also be used to aid a non-agent assembly layer. For example, some assembly algorithms rely on an initial synchronization to function properly. This presents an obstacle to starting one of these algorithms after the computation has begun. Agents can be used to provide the initial synchronization needed, without modifying the original algorithm. In general, agents can be used as a means of steering snapshot assembly modules. Assembling checkpoints into global snapshots should not always be done with agents, x.3 x.5 x.7 x.5 if switch case 000000 111111 a: 000000 111111 case b: case 000000 111111 c: 000000 111111 else Figure 4: A loop counter x is used as the basis for a logical clock. In order to show the transaction in case b as happening at the same time as the else block (across dierent processes) the agent would label both blocks with the same timestamp. as often times there will be ecient non-agent solutions. However, agents can play valuable roles in the assembly task. They provide an avenue to prototype general assembly algorithms and compare them to each other. Their easy installation and removal from a running computation allow users to try out dierent algorithms until they nd the one that best meets their needs. Another useful role for agents is the realization of custom assembly algorithms based on application specic information. Taking advantage of ordering knowledge embedded in the application can provide solutions that perform better than generic assembly algorithms. Finally, agents can be used to aid non-agent assembly layers. 5 Summary Many dierent approaches to snapshot assembly may be employed, each with dierent costs and benets. The choice of assembly method depends on the needs of the user and the application being monitored. Within the assembly task, agents can implement general assembly algorithms, perform snapshot assembly using application specic information, or support non-agent assembly components. Such agent-

based assembly can be useful in rapid prototyping of new assembly algorithms and for rapid deployment of application-specic assembly algorithms. This paper is based upon work supported in part by the National Science Foundation under Grant No. CDA-9619831. Any opinions, ndings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reect the views of the National Science Foundation. [7] B. Charron-Bost. Concerning the size of logical clocks in distributed systems. Information Processing Letters, 39:11{16, July 1991. [8] Jeerey Bradshaw, editor. Software Agents. MIT Press, 1997. [9] W. Brenner, R. Zarnekow, and H. Witting. Intelligent Software Agents. Springer- Verlag, 1998. References [1] Delbert Hart and Eileen Kraemer. Consistency considerations in the interactive steering of computations. International Journal of Parallel and Distributed Networks and Systems, to appear, 1999. [2] Delbert Hart, Eileen Kraemer, and Gruia- Catalin Roman. Using snapshot streams to support visual exploration. Technical Report WUCS-97-46, Washington University in St. Louis, 1997. [3] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558{ 565, July 1978. [4] Eileen Kraemer and John T. Stasko. Creating an accurate portrayal of concurrent executions. IEEE Concurrency, 6(1):36{46, January-March 1998. [5] Jean-Michel Helary, Robert H.B. Netzer, and Michel Raynal. Consistency issues in distributed checkpoints. IEEE Transactions on Software Engineering, 25(2):274{ 280, March/April 1999. [6] Delbert Hart, Eileen Kraemer, and Gruia- Catalin Roman. Query-based visualization of distributed computations. In Proceedings of the 11th International Parallel Processing Symposium, Geneva, Switzerland, April 1997.