Information and Monitoring Systems for Grids: Divide or Unify?

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Information and Monitoring Systems for Grids: Divide or Unify?"

Transcription

1 Information and Monitoring Systems for Grids: Divide or Unify? Tobias Groothuyse Vrije Universiteit Amsterdam Abstract In the past couple of years several information and monitoring systems have been developed. There have been unified systems but also systems that specified only on one of the two. Little research has been done to find out which approach is best. After currently designed systems are analysed and compared, this paper describes why unification of both systems is desirable and a global architecture for a unified system is described. 1. Introduction Before we can start talking about information and monitoring systems, let us first define what a grid is. In the past, several definitions have been used, but they all started to cover areas which weren t actual grids. Ian Foster defined a three point checklist [8] in which a grid system is defined as A system that coordinates resources that are not subject to centralized control using standard, open, general purpose protocols and interfaces to deliver non-trivial qualities of service. A grid is used to combine the resources (memory, CPU, network) of a collection of clusters in order to compute on various problems. Grids can be used for commercial and noncommercial purposes. Grids are supporting different scientific purposes from biology research to computer science to medicine. 1.1 Relevance of information and monitoring systems Information and monitoring systems for grids both provide meta information about the grid they run on. Monitoring systems generally provide highly dynamic information whilst information systems provide a more semi-static information. Information and monitoring systems are vital for keeping a grid running in good shape. The most important tasks that use information collected by these systems are as follows: Scheduling Jobs waiting the be run have to be scheduled somewhere on the grid. To make an informed decision about where to schedule the job, information on resource usage of various nodes is needed. Discovery In order to let the users of a grid know that there is a new node available, information has to be exchanged. An information systems takes care of this.

2 Performance issues In order to be able to tune the performance of the grid as a whole, performance analysis has to be performed, but performance prediction is also being done. Fault detection and diagnosis Fault detection and diagnosis is obviously vital for keeping a grid system running smoothly. 1.2 Divide or unify? A lot of information and monitoring systems have been made in the past couple of years. Some of them were unified and some them were divided. Reasons for designing a divided or unified system were usually explained in a few sentences. But practically no research has been published on whether unification or dividing is the better option. In this paper I will give an overview of both types of systems, I will discuss the various systems that have been developed over the years. I will also try to describe the perfect system and whether or not it is possible to design such a system. After that I will present my view on whether information and monitoring systems should be unified or whether they should be divided. 2. Information and monitoring systems 2.1 Problem overview Monitoring requires that information be acquired quickly, because the information is outdated very quickly. Therefore monitoring information is highly dynamic. In contrast, information systems generally provide semi-static information. This information should therefore be cached to minimalize resource (i.e. network) consumption. Because these systems are different in the sense that one needs caching and the other one doesn t and that one is highly dynamic and the other one static might lead to the conclusion that the systems should be separated. But that might not necessarily be the case. In 2000 Zoltán Balaton et al. [10] have compared a number of grid (application) monitoring systems, defining a number of properties and testing which systems comply with those properties. Their comparison was strictly focussed on monitoring systems. In this paper information systems will be compared also, providing a broader view on the subject. To be able to answer the question if the systems should be unified or divided, let us first determine a set of desirable properties for information and monitoring systems and see which properties are common Common properties For both monitoring and information systems the following properties are desirable. Platform independence A grid almost always contains nodes running on different operating systems, so platform independence is a must for any system running on it. Scalability with respect to number of nodes and requests Because a grid is such a largescale system, scalability is one of the biggest problems to be solved. A system has to be scalable to any number of nodes without response time being affected too much. The system also has to be scalable to the number of requests it receives. This second property does not necessarily have the same impact on the architecture as the first one.

3 Ability to survive node / link crashes In a grid system a node crash should be considered the rule and not the exception, the same holds for network links. Because of this, a good system should be designed such that a disappearance of a node, either by a crash or crash of the network connecting the node, should have no impact on the overall system, no matter which node crashed. No restrictions on the data type of information A good information/monitoring system should not put any restrictions on the data type of the information being used in the system. Doing so enables users of the system to create their own data types, giving a much richer and more extensible system. Security Security is of course also a very important property. The information being accessed through an information/monitoring system (IM) is not something that most people would like to share with just anyone. Proper authentication and authorization should therefore be present. Minimal resource consumption A system should always try to minimize its resource consumption because an IM is a secondary system and resources should be saved for the actual computational work being performed. Lightweight user interface A lightweight and easy to use user interface should be the rule for any good system. Ability to query compound information For example: Give me the node with the lowest CPU usage which is running on UNIX with at least 1 GB of RAM. We observe that this query uses both data from information and monitoring systems. If it is possible to request these kinds of information, only little clientside processing is needed. It also saves in the number of queries because the client would otherwise have requested all UNIX nodes, all nodes with a lot of RAM and then finally the CPU usage for these nodes Information system properties For information systems two additional properties come up. Ability to cache information An information system s main function is providing semi-static information to its clients. Because this information is semi-static, caching of this information at points close to the client gives a number of nice benefits. First, the query response time for a typical query will be shorter since the result is cached near the client. Second, caching the information uses less network traffic because results do not have to be brought in from the individual nodes on every query. Third, the resource consumption on the server lowers because it doesn t get that much queries. Ability to create a hierarchy of information With this I mean that the architecture has a number of different information levels. A possible way to do this, is to create a tree. In the tree, leave nodes pass information to their parents. Parents can combine information from their children, possibly abstracting it to create new information. It is then possible to have a global, but also specific view on the

4 information within the system. Besides a treeshape there can be all sorts of constructions in order to get the same result Monitoring system properties Four additional properties arise when inspecting monitoring systems. Low latency for queries Because monitoring information is highly dynamic it is vital that the response time for queries is very low. Otherwise the query will be useless because the received information is already outdated when it arrives at the client who requested it. Stability of query response time Not only should the latency for the queries be very low, they should also be stable. By that I mean that on successive queries the response time should not vary much. This is useful because if the response time is stable the client can depend on information being delivered within a certain timespan. Timestamp + TTL For monitoring information it is essential that every piece of information has a timestamp telling the client from which instant this information was taken, and a TTL (time-to-live) telling the client up to which moment in time this information remains accurate. Of course in a system with no restrictions on data types this can easily be implemented. Non-intrusive measuring Monitoring information expires rapidly, which means that measurements have to be taken frequently. The measurement itself, however, should not degrade system performance. A monitoring system therefore has to make sure that it controls its intrusiveness towards the node. 2.2 Systemsurvey In this section I will describe the most important IM systems that have been developed or are being developed at this moment. The selected systems have been chosen because they are currently being used in grid environments and/or because they have a different, interesting architecture, giving a rich collection of systems and design choices to inspect. Desirable properties, described in section 2.1, will be checked and described to which degree the various systems comply with them. GMA Short for Grid Monitoring Architecture [4]. As the name says, GMA isn t a system itself but rather an architecture for such systems. I will briefly explain GMA here because various IM systems are using this architecture. GMA consists of three components: consumer, producer and directory service (aka register). Producers and consumers can register themselves with the directory service. A producer registers itself and gives the directory service metadata on what kind of information it will produce. Based on this metadata a consumer can

5 query the directory service to find producers in which it is interested. A consumer can then register itself with the producer and the producer will send to all registered consumers new data when it has produced it (push model). A pull model, in which the client queries the producer which sends back the result, is also available for the consumer. An important observation about this architecture is that no actual data is stored at the directory service and that the actual datatransfer happens directly between a producer and a consumer. R-GMA Information system Monitoring system Crashresistent Any data type Compound queries Low latency responses Stable query responses Timestamp + TTL Caching Hierarchy of information Short for relational-gma [1]. R-GMA is a really direct implementation of GMA with the difference that the directory service is called the register. Scalability for the R-GMA system with respect to the number of nodes is good. Every produces registers itself with a register. The register can have multiple instances running, which synchronize frequently, so the registers can not be flooded because there are too many nodes in the grid. Scalability w.r.t. the number of queries is slightly more complicated. Because the actual datatransfer occurs directly between consumer and producer, a producer might get flooded easily if a lot of consumers are interested in its information. But within R-GMA there is a solution for this problem. It is possible to create a hierarchy of information or to duplicate certain information within the grid in the following manner. A new node, starts running a program which has a combined consumer/producer in it. The consumer consumes information from the previously being flooded producer and republishes it using it s own producer. In this way nodes can be relieved if they are being flooded with requests by inserting a dedicated node which takes care of redistributing their information on a more highspeed link. Note that this has to be done manually, there is no process taking care of this. However this solution has one drawback, we observe here that in a larger system the response time will be longer than in a relatively smaller system because of the intermediate nodes. The network consumption of R-GMA also depends on whether intermediate nodes are used. If they aren t used, the network consumption is equal to the number of queries. Which might generate a lot of traffic if a lot of consumers are interested in information from all producers. The use of intermediate nodes reduces network traffic because the intermediate node has information on multiple nodes. The response time for queries is low because information is transferred directly between a producer and a consumer. It is also stable for the same reason. The relational in R-GMA comes from the fact that a RDBMS is used for the storage and retrieval of information. SQL can be used to update the information (producers) and to select information (consumers). Because of this data model, compound queries and information with timestamps are supported.

6 MDS Information system Monitoring system Crashresistent Any data type Compound queries Low latency responses Stable query responses Timestamp + TTL Caching Hierarchy of information Short for Monitoring and Discovery Service [5]. MDS is being used within the Globus project. Three versions have been developed, the latest version has support for compound queries because the old LDAP [12] system has been abandoned. MDS doesn t use the GMA, it is built out of the following components: information providers (IP s) and aggregate directories (AD s). The first one provides information on the resources of one or more nodes. The second one combines information from various IP s in order to create a specific VO dependent view on the nodes. Because there are multiple AD s, there is no single point of failure, making the architecture crashresistent. Clients can query IP s directly or query AD s in order to acquire the information needed. Information in an AD will be cached, but unfortunately the default caching time is so low that information received by the AD will already be outdated, effectively rendering the caching mechanism useless. This means that a query to an AD will result in possible queries to all IP s under that AD, which in turn can lead to the querying of all nodes under those IP s. This is the reason that MDS is neither really scalable with respect to the number of nodes nor with respect to the number of queries. Network traffic is because of the effective lack of caching, high. Each query to an AD will result in a lot of queries down the tree towards the individual nodes. With proper caching settings there will be a lot less traffic since the architecture has a nice tree shape. Mercury Monitor Information system Monitoring system Crashresistent Any data type Compound queries Low latency responses Stable query responses Timestamp + TTL Caching Hierarchy of information The Mercury Monitor [3] is part of the GridLab project and it s architecture is based on the GMA. The Mercury Monitor architecture has three levels, with each level acting as a consumer for the lower level and as a producer for the higher level. From top to bottom these levels are the MS (monitoring service), MM (main monitor) and LM (local monitor). The latter one runs on each node and collects data about the node and the processes running on it. A LM sends it information to a MM.

7 There can be more then one MM per site for the sake of scalability. Clients can access the information by querying the MS, which is a client of the MM( s). The LM s and MM s create a hierarchy of information. A big difference w.r.t the other IM systems is that the Mercury Monitor also has an extra component called an actuator. Actuators implement controls which can be used to alter monitored entities or the monitoring system itself. With these kinds of controls it will be possible to adapt resources / applications automatically or by manual intervention. Scalability to the number of nodes is good because of the ability to have multiple MM s running. The MS though, could pose a serious bottleneck when the amount of queries rises. Nothing is said about the ability of having multiple MS instances. So the MS should be setup on a server with a lot of resources in order to be able to handle a lot of requests. Depending on whether the MS is on a capable server, the architecture is scalable for a lot of requests. This approach however gives the system a single point of failure. If the MS crashes, the entire system is nonfunctional. Network usage for the Mercury Monitor consists of the number of queries posed on the MS combined with the LM s providing the MM s with information. We observe that this implies that there is data being send (LM to MM) which might not be utilized in any way by a client, which introduces unnecessary network traffic. This traffic, however, happens within a cluster. The latency for query responses is low, but still three times as high as a typical R-GMA response time. In the Mercury Monitor the result first goes from the LM to the MM then the MS can get the result from the MM after which it is transferred to the client. The responsetime will be stable, transfer time between LM, MM and MS will normally be of the same magnitude on each query. The Mercury Monitor also allows user defined data types. Zoltán Balaton et al. Information system Monitoring system Crashresistent Any data type Compound queries Low latency responses Stable query responses Timestamp + TTL Caching Hierarchy of information This system [6], [13] doesn t seem to have a name yet. Its architecture isn t comparable to the ones we have seen so far. It consists of three components; the Resource Classifier (RC), the Resource Selector (RS) and the Advertisement Distribution System (ADS). The RC periodically produces an advertisement about the resources under its control. The ADS is responsible for distributing those advertisement to all RS s. Finally the RS s (positioned on the client-side) receive every advertisement within the system and filter those relevant for their client. Filtered advertisements are saved locally on the client-side in a user-definable storagemechanism. There is no support for compound queries on the server-side. The idea behind this system is that the distribution of the advertisements can be done via USENET. RC s post in a newsgroup. RS s read the newsgroup in order to retrieve the advertisements. USENET is indeed highly scalable, crashresistent and can process a lot of information per day. There are however also some major drawbacks. If this system were implemented using the actual public USENET network available at this moment, information propagation would take up to several hours, making the system only useful for static information. If the system was running on a private USENET network this delay will surely be a lot shorter. Scalability with respect to number of nodes and requests will be good, USENET has proven to be able to support a lot of posters as well as readers. Using USENET also implicates that the system will allow user-definable data types.

8 The network utilization of this approach is however not really optimal. First of all, every advertisement from every RC is spread throughout all USENET servers, without any knowledge on whether it will be useful for RS s. Depending on whether server-side search (XPAT) can be used effectively within this system, all readers must download all advertisement before they can locally decide which ones are relevant. This is obviously a waste of network bandwidth. DGC (GRelC) Information system Monitoring system Crashresistent Any data type Compound queries Low latency responses Stable query responses Timestamp + TTL Caching Hierarchy of information Short for Dynamic Grid Catalog Information Service [16]. DGC is based on the Grid Relational Catalog [14]. Before we can describe the architecture of the DGC, let us first look at the GRelC library on which this system is built. The GRelC library is a GRID-DBMS, which effectively means that it is a DBMS which is spread out over a GRID environment. It supports data source relocation, replication and fragmentation in order to optimize performance. GRelC makes use of XML and GridFTP for datatransfers. In the latest version support for compressing data is available, making the datatransfers significantly faster [15]. The usage of a relational database makes it possible to use any datatype, including timestamps. Compound queries are also support because of the SQL like querying. The DGC architecture consists of three components. We have the information providers (IPs), running on every node collecting information about it. Information produced by an IP is then sent to a LDGC (Local DGC relational information service ) and a GDGC (Global DGC information service) using a GRelC client library. The LDGC, which runs on every node too, stores this information in its database. The GDGC, of which there might be more then one running in each VO, stores the information in its database also. All the GDGCs combined thus have a global overview of the system whilst the LDGC has a local view. An individual GDGC has an overview on a number of LDGCs but not all. Both LDGCs and GDGCs can be queried by clients in order to retrieve information. The GDGCs are connected in a graph. When a query arrives at a GDGC it is forwarded to all its relevant neighbors in such a manner that all will receive the query and there will be no flooding. Individual GDGC s respond directly to the client if they have relevant results for the query. Because of this forwarding of queries there is no way of telling how long it will take before a query returns its result and whether or not the query is even finished. The latency for a typical query will therefore not be low. With good distribution of GDGC s, this architecture should be scalable w.r.t the number of nodes. With a lot of queries however, because of the queryforwarding, the system might not be able to cope with that. Network consumption consists of the IP to GDGC traffic, the queryforwarding and the returning of the resultsets for queries.

9 MCS Information system Monitoring system Crashresistent Any data type Compound queries Low latency responses Stable query responses Timestamp + TTL Caching Hierarchy of information Short for Metadata Catalog Service. The MCS was originally designed to provide metadata for files within a grid environment. It should be easily modifiable to support metadata for any kind of resource within a grid however, so they claim. Data types are restricted to textual meta information on files. The architecture of MCS is simple, it uses one masterserver to which all clients can query and add information. Because of the usage of only one masterserver which is not duplicated or separated, this approach is obviously not scalable, neither to the number of nodes nor to the number of clients. Network consumption however is very efficient, all traffic is necessary and there is no overhead of useless information being sent. Query response time will be short and stable because all information is always available at the master server. UPDP Information system Monitoring system Crashresistent Any data type Compound queries Low latency responses Stable query responses Timestamp + TTL Caching Hierarchy of information Short for Unified Peer-to-Peer Database Framework [18]. UPDF is an information system for grids based on peer-to-peer technology (p2p). P2p has proven to be an architecture that is extremely scalable, p2p networks exist with millions of nodes sharing multiple terabytes of information. The UPDF architecture is completely decentralized and consists of three components. An originator is a node that is querying the system. A query is submitted by sending the query to an agent node. Agent nodes process the query against there own local database and then forward the query to their neighbours. Several approaches are available for the sending of resultsets. The originator tells the nodes which approach is used by stating it in the query. Results can be sent back along the reversed path of the query, but they can also be sent back directly. Results are always sent to the agent node, which unifies the results and sends them back to the originator. It is also possible to only retrieve metainformation on the results in a similar manner. The originator can then deside which information it actually wants to receive and then the transfer happens directly between the originator and the individual nodes. Scalability for the number of nodes is good, p2p has proven that. Scalability with respect to the number of queries is also good. Queries are forwarded by the nodes itself but actual information transfer can be directly between a node and the originator. UPDF supports two anti flooding mechanisms. The first one uses a static timeout on every query. Queries are discarded if the timeout is reached. The second one is a dynamic timeout, this value is decreased on every hop. These anti flooding mechanism both introduce a serious flaw to the system. If there are a lot of nodes then the query forwarding will take more time then the timeout value allows for. This means that clients will only receive information from nodes close to them, in order words, they will have incomplete information. Because of this, clients will not be able to make an informed decision. In traditional p2p networks (filesharing networks) this isn t a problem because a good result will probably also be available in the neighborhood. For a good information and monitoring system, however, this unacceptable.

10 Due to the architecture the response time for queries will not be very low. In a large network with a large query timeout value, response time from nodes far away will be high. The stability of the response time is also bad because it is unclear when there are still nodes who will respond to the query. Because there is no hierarchy of information, there can only be local caching. The system is very crashresistent due to the p2p nature of the architecture. The query language of UPDF is called XQuery, it is an SQL like language for quering XML structured files. Any data type can be encoded within XML and it is possible to add a timestamp to all data elements. 2.3 Application monitoring systems The following two systems aren t a full blown monitoring system. They specify in monitoring applications on a grid environment. I have included these systems here because they are usefull in cooperation with a IM system. Both systems can be used as an information provider for a monitoring system, expanding it to also support application monitoring. Netlogger Short for Networked Application Logger [7]. Netlogger differs from the above described systems in the sense that it is a system designed specifically for monitoring applications and networks. Programmers can use the Netlogger API to insert breakpoints in the program, at which point some event is logged to a logfile which can be used for analysis later on. The logfile can be located at a central host so that all events will be logged in a central location. We see here that Netlogger itself isn t a real monitoring system because regular users can t query information provided in the logfile. However in combination with for example R-GMA, Netlogger could become a producer for application specific information, which then can be used by all clients. Autopilot The Autopilot [17] system is based on the Pablo performance toolkit [19]. Autopilot is an application monitoring system that is complemented by Virtue [20]. Autopilot can be used to monitor applications across a grid (and on parallel computing systems). Instead of writing the information to a logfile or memory, the information is feeded into the Virtue system which uses it to allow users to adapt local and global resource management policies in order to adapt to the changing nature of a grid environment. Autopilot works by inserting breakpoints in the program, just like Netlogger. Autopilot can also retrieve operating system information on for example context switches, interrupts and paging activity. 3. Designing a good system After reviewing the various information and/or monitoring systems it is still not easy to point out which system would be the best. At this point I think there are three systems that have real potential. Those are R-GMA, the Mercury Monitor and DGC. R-GMA is designed to be a unified system, the Mercury Monitor is designed to be a strict monitoring system and DGC is probably only suitable to be an information system. So which is better, more useful, more economic? Running just R-GMA or running for example the Mercury Monitor and DGC side by side? In the latter case it would of course be a good idea to create a unified user interface, this makes it a lot easier for the end-users to use those systems and not deal with two different interfaces.

11 3.1 Arguments for dividing / unifying At this point we can identify a number of arguments for dividing or unifying information and monitoring systems. Let s summarize them. Arguments for dividing Information and monitoring systems are different and have different, possibly conflicting properties The two systems will have a less complex design because they have to support only a specific type of information Monitoring systems produce information which you might want to react on in order to adapt resource control. Information systems don t. By dividing the systems it is easier to identify which information has to be reacted on It is useful for having a push model in monitoring systems, this is not the case for information systems Arguments for unifying Running one instead of two secondary systems on a grid imposes less overhead The overall grid architecture gets simpler because we have one service less Information from information and monitoring systems only differ in their degree of staticness Unified user interface 3.2 The perfect system in a perfect world None of the reviewed systems is perfect and of course that might be a bit much to ask. But what would a perfect system look like? Which properties should it have? A perfect system would of course be a unified system. Since it is perfect it should have no trouble in supporting both the needs of a good information system and the needs of a good monitoring system. It should be able to monitor the resources of all nodes but also the applications running on them. It should also comply with all the desirable properties stated in section 2.1. This system would be scalable for any number of nodes or requests, yet the query response time should be low and stable and the networkconsumption is low and consists of 100% useful traffic. In the real world, the triangle low query response time, scalability and low networkconsumption always seem to conflict. I don t think it is feasible to have these three properties coexist at the same time. So with a real world design, focus will probably be on two of the three properties, letting the third one get just a bit worse then we would all like to see. 3.3 Unify I think that most people would agree with me that if both systems could be unified in a proper manner then they should be. There is no reason in keeping two separated systems running if one can do the job as well. In a perfect world, the systems would be unified and it is always a good idea to strive to perfection. This statement would be worthless if there was absolutely no way to implement such a system, but I think that it can be done. Currently designed systems all have good, but different properties. If they were combined, a good unified system that supports both the needs of monitoring as well as information systems can be designed. Even though in section 3.1 there are as much arguments for dividing as for unification, they are not of equal importance. The arguments for unifying both systems have a much larger impact. A unified user interface makes the retreival of information a lot easier

12 for the users. The overal resource consumption will go down because of the fact that there is now one system less running on the grid. These benefits are much greater then the added complexity of the unified system and the other arguments for dividing. 3.4 A global unified system design Combining some of the properties and ideas from R-GMA and the Mercury Monitor with what I would like to call automated information relocation, it seems possible to create a proper unified information and monitoring system. Lets take the R-GMA architecture as a base for this new architecture and add features to it in order to make it a complete system. Dynamic load balancing To be able to dynamically balance the query load posed on the system, the system will be much more scalable because bottlenecks can be relieved when they occure, without manual intervention. It seems possible to create a unified information and monitoring system that adapts dynamically to the query load posed on the system. The biggest problem with R-GMA was that producer nodes can be flooded easily if a lot of consumers are interested in their information. I would picture the automated information relocation (AIR) subsystem as follows. Each producer has with it a number of user-definable slots. Each slot can be used for a consumer to connect to. If all slots are full, in order words, if the producer cannot handle any more consumers, the producer contacts one of the registers saying that it is getting too busy. The register then takes care of instructing a dedicated duplication node that it will start serving the information for that producer. Any request to the register for information on the busy producer will be redirected to the duplication node in a transparent manner. The duplication node registers itself with the previously full producer. In this way the consumers aren t even aware that they are getting information from something other then the producer itself. With enough and tactically positioned duplication nodes this system will be able to adapt automatically, in runtime, to the changing query load, making sure that producer nodes can t get flooded. Scalability for the number of queries should then be just fine. The response time for queries will go up a little because a duplication node is now sometimes in the loop.

13 Caching To reduce network traffic for more static information it would also be essential to support caching. Higherlevel producers should cache information that is considered semi-static. With pull requests from consumers, the information can be delivered immediately. Resource control The actuator component from the Mercury Monitor architecture should also be added to this system because it really adds extra value. By being able to send back control events to the monitored entities or the monitoring system itself, we add another automated balancing / adaptation mechanism which helps the system in adapting to the ever changing environment of the grid. Application monitoring A good architecture should also be able to support the monitoring of applications just as good as the monitoring of resources. Netlogger has proven to be a good system for monitoring applications. It could be used in cooperation with the R-GMA architecture. A Netlogger-producer will produce information on the various processes being run on a node, making that information available to the grid users. Hierarchy of information Up to this point we have an architecture that is scalable to a large number of nodes, because of register duplication (already in R-GMA). The system now also is scalable to any number queries in a dynamic manner. It supports both dymanic and semi-static information. With combined consumer/producer components within the system, a hierarchy of information can be established, providing possible abstracted and/or combined information to consumers.

14 4. Conclusion Information and monitoring system are a vital part of any grid environment. Because of that it is interesting to find out whether or not they should be unified and if it is even possible. In this paper we have looked at information and monitoring systems. We have identified a number of desirable properties for each kind of system. Currently designed system were reviewed, revealing weaknesses and good design choices. After reviewing the arguments for dividing and unifying we came to the conclusion the unification would be better. With the biggest reason being that it can be done in such a way the the endresult will be a good system supporting both the needs of information and monitoring systems. Future research should go into making a more thorough and concrete design of the proposed unified system. 5. Disclaimer This paper has been created with a purely theoretical knowledge of the various systems. I have had no hands-on experience with these systems, nor have I been able to perform benchmarks of these systems in order to get accurate test results. 6. Bibliography [1] Rob Byrom, Brian Coghlan, Andrew Cooke, Roney Cordenonsi, Linda Cornwall, Ari Datta, Abdeslem Djaoui, Laurence Field Steve Fisher, Steve Hicks, Stuart Kenny, James Magowan, Werner Nutt, David O Callaghan, Manfred Oevers, Norbert Podhorszki, John Ryan,Manish Soni, Paul Taylor, Antony Wilson and Xiaomei Zhu. R-GMA: A Relational Grid Information and Monitoring System. Proceedings 2nd Grid Workshop, Cracow, December, rgma.pdf [2] Gurmeet Singh, Shishir Bharathi, Ann Chervenak, Ewa Deelman, Carl Kesselman, Mary Manohar, Sonal Patil and Laura Pearlman. A Metadata Catalog Service for Data Intensive Applications. Proceedings of Supercomputing 2003 (SC2003) November [3] Zoltán Balaton and Gábor Gombás. Resource and Job Monitoring in the Grid. Proceedings of the Euro-Par 2003 International Conference, Klagenfurt, europar-2003-monitor.pdf [4] B. Tierney, R. Aydt, D. Gunter, W. Smith, M. Swany, V. Taylor and R. Wolski. A Grid Monitoring Architecture. GGF Performance Working Group www-didc.lbl.gov/ggf- PERF/GMA-WG/papers/GWD-GP-16-2.pdf [5] Karl Czajkowski, Steven Fitzgerald, Ian Foster and Carl Kesselman. Grid Information Services for Distributed Resource Sharing. Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, August [6] Zoltán Balaton, Gábor Gombás, Zsolt Németh. A Novel Architecture for Grid Information Systems. Proceedings of the Second IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID2002, Berlin, May Los Alamitos, IEEE Computer Society Press, 2002, pag

15 [7] Brian Tierney and Dan Gunter. NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging. Integrated Network Management 2003 pag www-didc.lbl.gov/papers/netlogger.overview.pdf [8] Ian Foster. What is the Grid? A Three Point Checklist. GRIDToday, July 20, www-fp.mcs.anl.gov/~foster/articles/whatisthegrid.pdf [9] Ian Foster, Carl Kesselman and Steven Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal Supercomputer Applications, 15(3), [10] Zoltán Balaton, Peter Kacsuk, Norbert Podhorszki and Ferenc Vajda. Comparison of Representative Grid Monitoring Tools Laboratory of Parallel and Distributed Systems (SZTAKI), LPDS- 2/2000, [11] Xuehai Zhang, Jeffrey L. Freschl, and Jennifer M. Schopf. A Performance Study of Monitoring and Information Services for Distributed Systems. Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC'03) pag. 270 [12] Gregor von Laszewski and Ian Foster. Usage of LDAP in Globus www-unix.globus.org/ftppub/globus/papers/ldap_in_globus.pdf [13] Zoltán Balaton, Gábor Gombás, Zsolt Németh. Information System Architecture for Brokering in Large Scale Grids. Parallel and Distributed Systems: Cluster and Grid Computing (Proceedings of DAPSYS 2002, Linz), Kluwer, 2002, pag [14] Giovanni Aloisio, Massimo Cafaro, Sandro Fiore and Maria Mirto. The GRelC Project: Towards GRID-DBMS. Proceedings of Parallel and Distributed Computing and Networks (PDCN) IASTED, Innsbruck (Austria) February 17-19, [15] Giovanni Aloisio, Massimo Cafaro, Sandro Fiore and Maria Mirto. Early Experiences with the GRelC Library. Journal of Digital Information Management Volume 2 Number 2, June [16] Giovanni Aloisio, Euro Blasi, Massimo Cafaro and Italo Epicoco. Dynamic Grid Catalog Information Service. Proceedings of First European Across Grid Conference, Lecture Notes in Computer Science, Springer-Verlag, Santiago de Compostela, Spain, [17] Randy L. Ribler, Jeffrey S. Vetter, Huseyin Simitci and Daniel A. Reed. Autopilot: Adaptive Control of Distributed Applications. Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing (HPDC), 1998, pag AutopilotAdaptiveControlofDistributedApplications.ps.gz [18] W. Hoschek. A Unified Peer-to-Peer Database Framework for XQueries over Dynamic Distributed Content and its Application for Scalable Service Discovery. PhD Thesis, Technical University of Vienna, March hoschek.pdf

16 [19] Luiz DeRose, Ying Zhang, and Daniel A. Reed, SvPablo: A Multi-Language Performance Analysis System. 10th International Conference on Computer Performance Evaluation - Modeling Techniques and Tools - Performance Tools'98, Palma de Mallorca, Spain, September 1998, pag [20] E. Schaffer et al. Virtue: Immersive Performance Visualisation of Parallel and Distributed Applications. IEEE Computer, December 1999, pag