distributed applications. Exploratory Visualization addresses the size and complexity of distributed systems by engaging the user as an active partner

Token Finding Strategies Delbert Hart Washington University in St. Louis St. Louis, MO, 63130 USA +1 314 935 7536 hart@cs.wustl.edu Eileen Kraemer University of Georgia Athens, GA, 30606 USA +1 706 542 5799 eileen@cs.uga.edu ABSTRACT Monitoring distributed computations provides a practical way of learning about and coming to understand a class of applications that are becoming increasingly common. The benets of distribution arise from processes cooperating together to achieve a common goal. Coordination among the processes is a fundamental ingredient for the cooperation to occur both correctly and eciently. One common form of coordination is token passing, where a token enables the process holding it to perform some action(s). In order to better understand this form of coordination it is useful to be able to monitor the process that holds the token, as the token moves from one process to another. Unless the token is being continuously monitored though the initial step is to nd the location of the token to enable subsequent monitoring of it. We describe a classication scheme for token nding strategies. Token-nding algorithms make use of information about the environment and application to make the search more ecient. The information examine in detail is the interconnection of the logical processes. We describe the Pathnder system which uses agents to implement token nding strategies. The exibility aorded by agents allows them to make use of environmental information at run time. Keywords Distributed Systems, Monitoring 1 INTRODUCTION Traditionally monitoring of distributed systems has centered on the state and occurrences at processes. Which is unfortunate since messages are the distinguishing feature of distributed systems. In a message passing system messages are the only way that processes communicate and synchronize their activities to cooperatively achieve their common goal. A common form of this synchronization is token passing. A token is a certicate that entitles the bearer to have access to a particular shared resource. Although the token problem is very specic, it is also very important. The challenges of nding a token in a distributed system are the same ones that need to be addressed in a number of other situations. Consideration of the token nding problem looks at those properties which are central and pose the diculties. Some examples of similar problems are: logical mobility { delivering a message to an agent, the agent can be viewed as a token resource discovery { nding what resources are available in a distributed system, here the resources are the tokens monitoring an n-body computation { nding a specic element in the data space In general, the task of nding an entity in a distributed system whose location is not known and who may move can be abstracted to the problem of nding a token. Instead of simply looking at individual algorithms this paper provides a classication scheme for token nding strategies. We then look at how algorithms from the dierent categories operate in dierent environments. General purpose solutions are less ecient than solutions that are adequate in a more restricted environment. Token nding algorithms are no exception to this rule. In presenting the strategies for locating a token we examine at how some commonly available information can be used to guide the choice of token nding algorithm. In particular it is especially useful to factor in information about the network. The connectivity of processes is something that all distributed computations posses, and it can have a large impact on the performance of dierent algorithms. To implement these strategies we use Pathnder, a system for Exploratory Visualization. Exploratory Visualization is an approach for monitoring and steering

distributed applications. Exploratory Visualization addresses the size and complexity of distributed systems by engaging the user as an active partner who guides the data collection and visual representation. The next section presents a classication for token- nding strategies. Section 3 examines how information about the network can be utilized, followed by section 4 which describes the Pathnder system and gives an example of using the information. A summary is given in Section 5. 2 STRATEGIES This section provides denitions used in the rest of the paper. To have concrete examples a brief description of ve dierent strategies is given. These algorithms serve as examples for the discussion of a classication scheme and how knowledge about the environment can be factored into the choice of an algorithm. Denitions A distributed computation C consists of a set of processes P communicating only through message passing, with directional communication channels between pairs of processes. A communication channel originates from a source node and terminates at a destination node. The topology that we consider through out the paper is the one at the level of abstraction of the monitoring, e.g., the logical topology rather than the physical topology for nding a token in a user's application program. Each process P i is a sequence of events e i;1...e i;ni. Each event e i;x is a local event, a send event, or a receive event. A local event represents a state transition within a single process. All send/receive events are members of a message (an event pair <e i;x, e j;y >) belonging to M, the set of messages. Events are partially ordered by, the happened-before relation [4]. e i;x e j;y holds if and only if one of the following is true: 1) i = j and x < y, or 2) <e i;x, e j;y >2 M, or 3) there exists an event e k;z such that e i;x e k;z and e k;z e j;y. An execution is described by a pair < C, M >. A token is a logical entity that represents permission for a process to perform some action. A computation has a xed set of tokens T. A history of the locations of each token t i can be delineated by a sequence of messages t i;1...t i;ni. The send event e j;y of the message t i;x =<e j;y, e k;z > represents the departure of token t i from process P j. Likewise the receive event e k;z represents the arrival of t i at process P k. For an adjacent pair of messages t i;x and t i;x+1, the receive of t i;x occurred earlier in the same process as the send of t i;x+1. The process at which the token is currently located is referred to as the token holder. Token nding algorithms are typically initiated by a process, called the coordinator, that exists outside the target computation. A monitored execution is an execution in which mon- Figure 1: An unmonitored process (left) is a sequence of application events (white circles). The monitored process (right) contains monitoring events (black ovals) that are interleaved with original the application events. itoring events are interleaved with the computation's events (Figure 1). The monitoring system allows the presence of a token at a process to be detected. Tokens are not detectable when they are in a communication channel. To avoid the case of a token arriving at a process where monitoring is active and leaving before it is detected at least one monitoring event exists between a receive event and a subsequent send event in the same process. Monitoring events do not directly aect the computation's state or movement of the token. However, monitoring may indirectly aect the computation by competing for resources such as CPU, memory, and the communication bandwidth. The eect that monitoring has on the local process is called local perturbation. Message overhead refers to the number and size of messages generated by the monitoring process. Global perturbation is the cost of the local perturbations, the message overhead and the repercussions that delays might have on the timing of the application. Other performance characteristics of monitoring techniques are coverage and lag. Coverage refers to how many processes have their monitoring active. Lag is a measure of the delay between a message send and its receipt. The goal of monitoring tools is to make available strategies that meets the user's needs and minimizes the cost of monitoring, i.e., the global perturbation. Characteristics of Token Finding Algorithms To provide concrete examples of the dierent characteristics of token nding algorithms, Table 1 illustrates ve dierent strategies. The properties that are examined 2

in this section are growth, propagation, residue, and guarantees. After looking at these basic properties the next section presents how information about the environment, specically network properties, can be used to guide the choice of algorithm. Most of the properties indicate how the monitoring coverage changes over time. The growth property indicate how quickly the coverage can increase. While how long a node remains part of the coverage and how far the coverage extends are given by the residue and propagation properties respectively. Growth After starting at an initial set of processes the strategies dier in how their coverage grows until the token is found. Specically the growth property looks at how monitoring spreads directly from one process in the computation to another, i.e., without the coordinator. Some algorithms such as wait and broadcast do not grow because there is no interaction between processes. The search strategy on the other hand does not grow by moving the activated monitoring from one process to another. A constant amount of growth indicates that regardless of the size of the computation or of the number of links from a node the strategy has a constant amount of growth. The entrapment algorithm is an example of an algorithm may grow by a constant amount. It adds one process at a time to the coverage until the token is found. Proportional growth strategies scale the amount they expand to take into account the size of the computation or the local number of links at the process. Activating monitoring at all neighbors of a process is an example that some algorithms use to grow, e.g., the ood strategy. There is a trade-o between the rate of growth of a strategy and the resource usage of the algorithm. A high growth rate will reduce the amount of lag between the request for nding the token and its location, a computation may not be able to tolerate the resource demands of a high growth rate. Another consideration is that strategies with growth rates above a low constant, e.g. one, are dicult to halt before they have checked the whole network. Even if the token is found almost immediately there is no way to quickly communicate this information to the halt the growth of coverage. Residue Residue refers to how long artifacts from the strategy remain at a process. After initially determining that the token is not present at a process most strategies will continue to watch the process for a time to ensure that the token does not arrive from one of the channels. The rationale for deciding when to remove the residual artifacts varies though. One common reason for deactivating monitoring is if it has been determined that the monitored can not reach the process undetected. One way this could be known is if for all channels arriving at the process the sources are being monitored and the channels have been ushed. Flushing is the mechanism through which a process knows that a token is not on the way to it. The way that channels are ushed depends upon the communication characteristics of the network. The broadcast strategy does an implicit ush of the channels. It knows that monitoring will be active at all processes by a certain time, then it simply waits until all messages sent by that time would be delivered. This assumes that there is a maximum network delay, which is reasonable in many environments. Explicitly ushing a channel consists of sending a marker or message to indicate that the token is no longer in the channel. Once markers have been received on all channels the process may deactivate its monitoring. This technique though assumes that the channels are at least FIFO. Explicit ushing has less lag, the cost of which is higher message overhead and typically involves a more complicated scheme to both clean-up and to keep message trac to a reasonable level. Another trade-o to consider is between having long lasting residuals and the growth and propagation of the coverage. The slower growing and far propagating strategies often will have long lasting residuals. On the other hand a fast growing strategy may have a short residual (or none at all). One relevant factor in this decision is how much overhead the residual imposes on the process. How expensive is it to watch the incoming messages and/or periodically check the computation's state for the token. The other is how sensitive the computation is to message trac. Propagation & Guarantees Related to the rate of growth of a strategy, is how far does the coverage extend from its point of origin. Some algorithms stop propagating after reaching some nite measure, such as a time limit or a hop-count. Other algorithms will continue until the token is found, no matter how long that takes. There may be performance or time bounds on nding the token, e.g., if the token is not found by the time the computation enters the next phase then stop looking for it. The guarantees attribute describes whether or not a strategy, if run to completion, will locate the token. As mentioned in the growth section it may be dicult to prematurely terminate some strategies due to their growth rate. 3 ENVIRONMENTAL INFORMATION Information about the environment or the application can make the search process more ecient. All distributed computations operate within a network. In this section we look at how some commonly available information can inuence the choice of strategy. The 3

(network) For simplicity, the examples will assume the network is xed and FIFO. The circles represent nodes of the network and the arrows directional channels. The coordinator is shown as a square. (wait) A simple strategy is to activate monitoring at a single node and wait for the token to arrive. (search) Instead of waiting for the token to arrive the search strategy moves from node to node looking for the token. The search ends after a predetermined number of nodes have been visited. (entrapment) An entrapment strategy seeks to increasingly limit the token's possible locations until it is found. In this example the entrapment strategy has partitioned the network into two parts, watching the nodes in the middle to ensure that the token does not move into the left half while it is searching the right half. (broadcast) The coordinator directly activates monitoring in all of the nodes in the broadcast strategy. After a period of time the monitoring automatically deactivates. (ood) The ood strategy begins at one node and spreads on all links until the entire computation is looking for the token. Table 1: Example strategies. 4

Growth Residue Propagation Guarantees None/One Constant Proport. expire explicit nite innite wait x x n/a n/a no search x n/a n/a x no entrapment x x x yes broadcast x x n/a n/a yes ood x x x yes Table 2: Token-Finding characteristics. Topology Stability Communication Fully Connected Fixed Total General Growth Causal Ring Loss FIFO Tree Both Non-FIFO Table 3: Network characteristics. end of the section describes some additional sources of information that can be used. Network Information The network characteristics can serve as an important source of information about how the token may be propagated and thus the token's behavior. The communication channels also have an impact on the guarantees that can be made about the token being found and the cost associated with nding it. The physical characteristics of the network are important in choosing an appropriate strategy for nding a token. The three aspects of the network that we look at here are topology, stability and communication model. The topology of the network describes how the nodes are connected to one another. The stability of the network is an indication of the lifetime of the links between nodes. The communication model describes the guarantees made about the order of message delivery. In general, the communication channels are considered lossless, but this classication does include lossy message delivery. Topology A Fully connected network is one in which all nodes can directly communicate with all of the other nodes. This makes it very dicult to trap the token at any particular node, but it also provides exibility for token nding algorithms. General networks do not have any signicant properties associated with how the nodes are connected to one another. Sometimes a general network can be logically abstracted to a network with a more constrained topology. A Ring network is one where whenever the token moves it is forced to move closer to a particular node. This is makes it much simpler to ush the token out of the whole network. A listener can be set up at the focal node to catch the token. The Tree classication is a network where a node can either communicate with \upstream" or \downstream" nodes. Upstream is the direction of the majority of nodes. Trees are especially well suited to entrapment type strategies since a parent node needs to be traversed for a token traveling between siblings. Most of the strategies operate better in a topology with fewer and more structured interconnections. The exception to this is the broadcast algorithm whose cost and performance is relatively independent of the topology. The topology of a network also provides information about which nodes are bottlenecks and thus may be less amenable to an intrusive form of monitoring due to resource constraints. Stability The most common assumption for distributed algorithms is that the network topology is xed for the duration of its operation. This is a reasonable assumption in most cases. For some applications though, e.g., physical mobility, the stability of the network may not be guaranteed. For others, an algorithm that does not assumed a xed network may be used in order to simplify other aspects of the network. For example, if only a small portion of the network is typically used, then the strategy might use the reduced network, with a less stringent stability assumption. Growth occurs when a new channel is created to and/or from an existing node in the network. If both the source and the destination were already in the network then no special considerations need to be made. If the source was not in the network then the coordinator must contact the node directly if it is to be included in the search. If the destination was not in the network then it is possible that the source could forward the monitoring activation to the new node if it still has the information 5

locally. The loss of links while the algorithm is running can pose challenges for nding the token. The diculty is if an algorithm planned to reach a node via a particular path, that path may be invalidated, disrupting the search strategy. If the network is guaranteed to remain connected then an algorithm could simply restart upon the detection of a failure. If the network may become partitioned then loss of connectivity is permanent and only the new smaller network needs to be considered. Messages that are on a communication channel when it is removed are lost. The most complicated case is if the network may gain or lose links while the strategy is being executed. It contains the diculties of both growth and loss. In addition, the fact that network partitions may not be permanent needs to be considered. Decisions need to be made about when the unreachable portion of the network is expected to be available and what the best way to be ready for reconnection is. Operating in an arbitrary, dynamic network to nd a token can be very dicult. A lossy communication channel can be modeled by links that are able to be added and deleted. Since the broadcast and wait strategies do not directly use the channels, they communicate only with the coordinator, these algorithms are the least sensitive to how volatile the channels are. The search strategy also does not rely on any particular channel assumptions. The entrapment and ood algorithms do rely on certain characteristics of channels. For entrapment, new links may open up a potential route for the token to bypass the nodes that have activated monitoring. Also problematic are lossy channels which imply that a token may become lost. Communication The more powerful communication models total and causal supported may simplify the design of the algorithm to locate the tokens. Totally ordered communication ensures that every process receives messages in the same order. A strategy that relies on totally ordered communication would broadcast to all nodes a message to begin monitoring, then after it knows that all of the processes have received the message it would broadcast a message to terminate monitoring. The total ordering property would ush out all messages on all channels between the two broadcasts. Causal communication guarantees that the communication space is metric, i.e., a message sent through a third party is not faster than a message sent directly. FIFO is the most common communication property that ensures that messages on the same channel are received in the order that they are sent. Non-FIFO communication channels do not make any guarantees about the order that messages are received in. As in an environment with lossy channels, if the channels are Non-FIFO then it is dicult for strategies to make any guarantees about nding the token. Other Information Besides application specic information which is too specic to be of use across dierent application, other sources of information that can be proted from are performance data, token characteristics, and the use of precomputation. Performance data such as the lag between nodes, the loads at the nodes, and the token speed relative to the lag are all pertinent. If the load at a node or the lag on channels is too high, then the strategy may decide that there are more protable places to search for the token. An example of a pertinent token characteristic is the length of time that a token remains at a process. The speed that the tokens travel relative to the locating strategy will aect how eective the strategy is. If the token is moving slower than the nding algorithm then it is possible for the algorithm to follow the token and catch up with it. If the token speed is greater or about equal to the algorithm speed then it is necessary for the algorithm to predict where the token will be and catch it when it arrives at a new destination. Instead of relying only upon information found in the computation the monitoring system may construct information that can be later used in the search process. To aid an entrapment strategy the coordinator may compute what is the most ecient way to partition the network. Individual processes may record which channel a token left on to aid a search strategy. Or a historical database of the token's movement patterns could be used to anticipate where it is likely to be found. This section has looked at how information about a distributed computation, the underlying network in particular, can aect the choice of token nding strategy. The next section describes the Pathnder system which uses an agent framework to carry out monitoring strategies. The use of agents allows information to be easily incorporated into the strategies at run time. 4 PATHFINDER In this section we present an overview of Pathnder, a system for Exploratory Visualization, then we describe the agent model and Pathnder's implementation of it. An example of how information about the network can be utilized by agents to more eciently nd the token is presented. More information about Pathnder [2] and the agent model[3] are available. Overview Exploratory Visualization is an approach for monitoring and steering distributed applications. Exploratory visualization addresses the size and complexity of dis- 6

tributed systems by engaging the user as an active partner who guides the data collection and visual representation. The key insight is that it is both unnecessary and inecient to collect all possible data that an application has to provide. Only the data that supports the user's current interests should be collected. This selective monitoring, in conjunction with navigation tools to modify the perspective on the computation, provides a dynamic and interactive paradigm for monitoring. Pathnder has three operational components (Figure ): Interaction Managers (IMs) provide an interface to the application processes, a Stream Manager (SM) coordinates the activities at the IMs and creates the snapshot stream that is sent to the User Interface (UI), which provides the user with visualization and steering capabilities. The UI is written in Java and currently supports visualizations that utilize Java3D or Pavane[5]. Monitoring has diverse needs that depend on the application being monitored and the user. It may be possible to congure the monitoring software for a particular application, but it is dicult to anticipate all of the demands that a user will place on the system. This implies that the monitoring software must be easily congurable. Pathnder has a modular design that allows the system to be congured to meet specic requirements. A module has access to an attribute database, a collection of application variables that have been published for monitoring and steering. The Interaction Manager also makes event occurrences in the application available to the modules. Currently both events and attributes are made created through source code annotations. The Interaction Manager wraps the communication library used by the process in order to make information about messages available. The current implementation uses the PVM communication library. Agent Model The purpose of the agent model is to have a simple, yet powerful and exible framework for specifying agents for monitoring and steering. Agents have full access to each other locally and the only communication between locations is through agent migration. Custom agents are used to represent the application. The model has two kinds of entities: agents and milieux. An agent is an encoding of data and code. A milieu is a location at which agents operate and interact (Figure 3). It provides services that allow an agent to execute inside of it, to interact with other agents, and to move to other milieux. A milieu accepts and transmits agents, via an incoming queue and an outgoing queue, respectively. The method of transport between one milieu's outgoing queue and another milieu's incoming queue is FIFO and lossless. Two elds that all agents have are id and dest. The id eld contains an identier that is guaranteed to be unique across the whole system. The id eld of an agent is immutable. When an agent is in an outgoing queue or in transit between milieux the dest eld indicates the milieu to which the agent is going. The code contained in an agent is in the form of a set of handlers. These methods on the agent dene how it reacts to events in the milieu. When an agent arrives at a milieu, i.e., it is removed from the incoming queue, it is placed in the Agent Repository, where it remains while it is in the milieu. A distinguished handler of the agent is the Arrival handler which, if present, is automatically executed by the milieu when the agent is removed from the incoming queue. Internally a milieu is driven by events. Events are signaled either by an agent or by the arrival of an agent via the incoming queue. Agents can register to react to an event by indicating which of their handler(s) should be triggered. The milieu uses the Event Registry to schedule agent handlers in response to the signaled event. The handler is executed atomically, and receives the agent that signaled the event as an argument. From within a milieu agents are able to interact with other local agents and with the milieu. An agent can: access any other agent in the milieu through the Agent Repository. The elds or handlers of any agent, including itself, can be read from or written to. signal events. An agent may insert or remove entries of the Event Registry pertaining to any agent. create a new agent. The new agent's id will be automatically generated by the milieu. induce an agent to move to another milieu. Agent Implementation Pathnder utilizes agents by installing milieux in each of the Interaction Managers and in the Stream Manager. Each milieu is an embedded Perl[6] interpreter. The agents are in the form of small pieces of Perl code. The milieu is loosely coupled to the process it is installed in. The interface between the Perl portions of the milieu and the C coded portions consists of bindings of the incoming and outgoing queues, Perl modules that provide interfaces to resources present at the process, and Event agents that are generated by the agent module and inserted into the incoming queue of the milieu. SWIG[1] was used to create the extension modules that provide access to the resources of the process, e.g., the attribute database, messages, and generated snapshots. 7

Interaction Managers Stream Manager Steering Stream Snapshot Stream User Interface Figure 2: Pathnder's architecture. The User Interface sends a sequence of commands to the Stream Manager who coordinates actions at the Interaction Managers. The SM also collates the local snapshots from the IMs into a sequence of snapshots that the UI uses for visualizations. Application Module Application Incoming Queue Outgoing Queue Interaction Manager Attribute Database Agent Module Schedule Agent Repository Event Registry Communication Library Communication Library Module Figure 3: An agent module can be one of many modules installed in the Interaction Manager to provide monitoring and steering related services. The agent module provides the bridge between the application and the milieu. 8

The Interaction Manager shares the thread of execution with the application. Control is transferred to the milieu when the IM generates an Event agent. The application regains control after all of the agents have responded to the event represented by the Event agent. Typical events at the IM include message send, message receive, and local snapshot ready to send. The Event agents include information relevant to the event they represent, e.g., the local snapshot event contains a pointer to the snapshot that is ready to be sent. Agents are introduced into the monitoring system by the user choosing an agent to send to the Stream Manager. The choice corresponds to the monitoring strategy that the user wants to employ. Once the agent arrives at the SM's milieu it begins creating agents that will go to the Interaction Managers and implement the monitoring strategy. Monitoring Example Consider the logical network shown in Figure 4. The topology is that of a ring of star clusters, which would classied as a general graph. To monitor this network it is useful to view it as two token nding problems, one in a ring and one in a tree. The strategy is to use agents to implement a modied entrapment algorithm. Since the number of clusters is small, it is not too expensive to broadcast agents to each of the centers of the clusters. Once at the center of the cluster a simple entrapment algorithm can be employed. 5 Summary The problem of nding a token in a distributed system is an important one having ramications on other areas of current interest, e.g., agent systems. This paper reviewed several major properties of token nding strategies and discussed how information about the environment, and the network topology in particular, could in- uence the choice of algorithm. ACKNOWLEDGEMENTS This paper is based upon work supported in part by Boeing and the National Science Foundation Grant No. CDA-9619831. REFERENCES [3] D. Hart, G.-C. Roman, and E. Kraemer. An agent infrastructure for exploring long-lived distributed computations. Technical Report WUCS- 99-11, Washington University in St. Louis, 1999. [4] L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558{565, July 1978. [5] G.-C. Roman, K. C. Cox, D. Wilcox, and J. Y. Plun. Pavane: a system for declarative visualization of concurrent computations. Journal of Visual Languages and Computing, 3(2):161{193, June 1992. [6] R. Schwartz and L. Wall. Programming Perl. O'Reilly and Associates, 1994. [1] D. Beazley. SWIG: An easy to use tool for integrating scripting languages with C and C++. In Proceedings of the 4th USENIX Tcl/Tk workshop, 1996. [2] D. Hart, E. Kraemer, and G.-C. Roman. Consistency considerations in the interactive steering of computations. International Journal of Parallel and Distributed Systems and Networks, to appear, 1999. 9

Figure 4: The nodes in this network are organized as a set of clusters connected in a ring. Each of the clusters has a single process that is connected to all of the other processes of the cluster. 10