MONITORING OF GRID RESOURCES

Similar documents
GridMonitor: Integration of Large Scale Facility Fabric Monitoring with Meta Data Service in Grid Environment

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

igrid: a Relational Information Service A novel resource & service discovery approach

Information and monitoring

Grid Computing Systems: A Survey and Taxonomy

GlobalWatch: A Distributed Service Grid Monitoring Platform with High Flexibility and Usability*

Customized way of Resource Discovery in a Campus Grid

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

GridRM: An Extensible Resource Monitoring System

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Performance Analysis of the Globus Toolkit Monitoring and Discovery Service, MDS2

Layered Architecture

Job-Oriented Monitoring of Clusters

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

An Evaluation of Alternative Designs for a Grid Information Service

A Distributed Media Service System Based on Globus Data-Management Technologies1

Replica Selection in the Globus Data Grid

Implementing Probes for J2EE Cluster Monitoring

GStat 2.0: Grid Information System Status Monitoring

The NorduGrid Architecture and Middleware for Scientific Applications

MONitoring Agents using a Large Integrated Services Architecture. Iosif Legrand California Institute of Technology

Adaptive Cluster Computing using JavaSpaces

THE VEGA PERSONAL GRID: A LIGHTWEIGHT GRID ARCHITECTURE

Globus Toolkit Firewall Requirements. Abstract

JCatascopia: Monitoring Elastically Adaptive Applications in the Cloud

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Architecture Proposal

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Knowledge Discovery Services and Tools on Grids

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Java Development and Grid Computing with the Globus Toolkit Version 3

Research on the Key Technologies of Geospatial Information Grid Service Workflow System

Grid Architectural Models

This document contains an overview of the APR Consulting port of the Ganglia Node Agent (gmond) to Microsoft Windows.

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

A Framework For Instrument Monitoring On The Grid

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

A Survey Paper on Grid Information Systems

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Introduction to Mobile Ad hoc Networks (MANETs)

Monitoring ARC services with GangliARC

Research and Design Application Platform of Service Grid Based on WSRF

An agent-based peer-to-peer grid computing architecture

Grid Middleware and Globus Toolkit Architecture

Towards a Framework for Monitoring and Analyzing QoS Metrics of Grid Services

CA464 Distributed Programming

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

A Simulation Model for Large Scale Distributed Systems

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

Distributed Systems Principles and Paradigms

Monitoring the Earth System Grid with MDS4

Resource Allocation in computational Grids

By Ian Foster. Zhifeng Yun

ICENI: An Open Grid Service Architecture Implemented with Jini Nathalie Furmento, William Lee, Anthony Mayer, Steven Newhouse, and John Darlington

GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content

THE API DEVELOPER EXPERIENCE ENABLING RAPID INTEGRATION

Web Services for Integrated Management: a Case Study

INTEGRATING THE PALANTIR GRID META-INFORMATION SYSTEM WITH GRMS

The GridWay. approach for job Submission and Management on Grids. Outline. Motivation. The GridWay Framework. Resource Selection

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing

Usage of LDAP in Globus

Grid Computing. Lectured by: Dr. Pham Tran Vu Faculty of Computer and Engineering HCMC University of Technology

Diagnosis via monitoring & tracing

CS603: Distributed Systems

A Resource Look up Strategy for Distributed Computing

Chapter 18 Distributed Systems and Web Services

On the Creation & Discovery of Topics in Distributed Publish/Subscribe systems

Managing Oracle Real Application Clusters. An Oracle White Paper January 2002

GRID monitoring with NetSaint

Distributed Systems. Bina Ramamurthy. 6/13/2005 B.Ramamurthy 1

Exploiting peer group concept for adaptive and highly available services

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment

Introduction to Grid Technology

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?


Grid Compute Resources and Job Management

Installing the Cisco Unified CallManager Customer Directory Plugin Release 4.3(1)

A AAAA Model to Support Science Gateways with Community Accounts

Grid Computing Fall 2005 Lecture 10 and 12: Globus V2. Gabrielle Allen

Grid Computing: Status and Perspectives. Alexander Reinefeld Florian Schintke. Outline MOTIVATION TWO TYPICAL APPLICATION DOMAINS

Digital it Signatures. Message Authentication Codes. Message Hash. Security. COMP755 Advanced OS 1

Towards a Unified Monitoring and Performance Analysis System for the Grid

Cloud Computing & Visualization

Introduction. Distributed Systems IT332

Grid Resource Discovery Based on Centralized and Hierarchical Architectures

System Requirements COSE. Helsinki 8th December 2004 Software Engineering Project UNIVERSITY OF HELSINKI Department of Computer Science

CFS Browser Compatibility

Cisco pxgrid: A New Architecture for Security Platform Integration

Agent-Enabling Transformation of E-Commerce Portals with Web Services

NOW and the Killer Network David E. Culler

Enterprise Manager: Scalable Oracle Management

ROCI 2: A Programming Platform for Distributed Robots based on Microsoft s.net Framework

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Utilizing Databases in Grid Engine 6.0

Distributed Multitiered Application

Monitoring systems: Concepts and tools

KNSP: A Kweelt - Niagara based Quilt Processor Inside Cocoon over Apache

Transcription:

MONITORING OF GRID RESOURCES Nikhil Khandelwal School of Computer Engineering Nanyang Technological University Nanyang Avenue, Singapore 639798 e-mail:a8156178@ntu.edu.sg Lee Bu Sung School of Computer Engineering Nanyang Technological University Nanyang Avenue, Singapore 639798 e-mail: ebslee@ntu.edu.sg Abstract Grid computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation.[1] Managing resources on the grid is extremely crucial for effective resource utilization. This paper presents an approach to resource monitoring and publishing using an information provider. 1 Introduction Grid Monitoring Service is the activity of measuring significant grid resources related parameters in order to analyze usage, behavior and performance of the grid. It is also used to detect and notify fault situations contract violations (SLA) and user-defined events. Therefore, it is an essential part of the grid management activity. The components of a grid monitoring service are: 1. Measurement Service: This service probes the resources for certain parameters (especially QoS related) 2. Discovery Service: This service finds out which resources are currently available and populates them on the grid. 3. Detection and Notification Service: This service deals with fault situations, SLA violations and userdefined events. 4. Data Analyzer: It prepares performance and usage reports and statistics 5. Presentation Service: It is the web-based graphic user interface used for presentation of the data. Before I explain the architecture and implementation of my project I will briefly describe the external components used to build my system: Monitoring and Discovery Service(MDS) and the Ganglia Monitoring Cluster Toolkit. 2.1 Monitoring and Discovery Service(MDS) Most of the development surrounding the information domain of Grid Computing has been related to Monitoring and Discovery Service (MDS) of the Globus Toolkit. MDS is designed to provide a standard mechanism for publishing and discovering resource status and configuration information. It provides a uniform, flexible interface to data collected by lower-level information providers. It has a decentralized structure that allows it to scale, and it can handle static or dynamic data. MDS includes a set of core GRIS information providers used to generate information such as platform type, host OS, system load, memory, file systems, etc. A user can also create an information provider to publish information to the MDS. However, the MDS does not on individual nodes. It only provides aggregate cluster monitoring data.(figure 1) Client 1 searches the GRIS directly Resource A GRIS GRIS register with GIIS GIIS requests info from GRIS services GIIS Cache contains info from A and B Client 1 Client 2 2.2 Ganglia Monitoring Cluster Toolkit Resource B GRIS Client 2 uses GIIS for searching collective information For node-level monitoring there is a widely accepted tool, the Ganglia Monitoring Cluster Toolkit. Ganglia is a scalable distributed monitoring system for high performance computing systems such as clusters and grids. Using a daemon (gmond) Ganglia provides monitoring information on a single cluster. Since gmond s on every node, it provides monitoring data such as load_one, load_five, cpu_num, mem_total, etc. of every node[2]. Figure 1: Architecture of MDS

3 Design of the system The objectives of my design were to come up with a monitoring system, which effectively represents the data. It was also essential to ensure that there are no performance overheads caused due to the monitoring of resources. The architecture of my project can be described with the aid of Figure 3. As the figure shows clearly my project comprises of two sections: (1) An information provider which publishes information from the Ganglia to the MDS (2) An MDS Browser (presentation layer), which provides a web interface. Figure 2: Architecture of Ganglia Presentation Service My Project MDS Browser Data Analyzer Data Collection (historical) Detection & Notification Grid Information Service Measurement Service MDS Information Provider Ganglia Figure 3: Architecture overview of my system 3.1 Information Provider The essence of creating an information provider involves three basic steps: 1. Decide what kind of information needs to be published into MDS. Then decide how that information should be represented in the Directory Information Tree (DIT). This requires the definition of a schema, an OID assignment [3], and naming conventions for that information. I decided to use the GLUE-CE schema which aims to provide interoperability between grid project efforts [4]. Only those metrics were chosen to be published which were crucial for performance monitoring and scheduling. 2. Create a program that adheres to the input and output interface requirements of LDAP. The program must be callable by the fork() and exec() facilities of the GRIS back end, and must return LDAP Data Interchange Format (LDIF) data objects according to the defined schema. I decided to write and information provider using C. 3. Enable the program by adding an entry for it in the grid-info-resource-ldif.conf file. The grid-inforesource-ldif.conf file defines all active GRIS providers, so when the provider is created a reference to it needs to be added to this file. Fundamentally, an information provider is any computer program written such that it adheres to two specific

interfaces: the input and output interfaces of the GRIS back end (as shown in Figure 4). 3.1.1 Define Provider Schema For this purpose the widely accepted GLUE-CE schema was used. Since it contains all the necessary attributes required it just needed to be mapped to the corresponding Ganglia metric (as shown in Table 1). 3.1.2 Create a Provider Program The purpose of a user-created information provider is to get data from the Ganglia monitoring daemon (gmond) and put the data as an LDIF object. (see Figure 5) Figure 4: Interface of an information provider Information Provider Input The input for the information provider comes from the gmond. Gmond is a multi-thed daemon which s on each node of the cluster that needs to be monitored. Gmond has four main responsibilities: monitor changes in host state, multicast relevant changes, listen to the state of all other ganglia nodes via a multicast channel and answer requests for an XML description of the cluster state. This XML provides the input for the information provider and it can be used to publish it on the MDS. Common Name Hostname A Unique ID assigned to the host Processor Load, 1 Min Average Processor Load, 5 Min Average Processor Load, 15 Min Average SMP Load, 1 Min Average SMP Load, 5 Min Average SMP Load, 15 Min Average Number of CPUs Processor Clock Speed (MHz) The name of the Network Interface Network Adapter address The amount of RAM Free RAM (in KBytes) Attribute (OID) GlueHostName 1.3.6.1.4.1.8005.100.3.2.3.1 GlueHostUniqueID 1.3.6.1.4.1.8005.100.3.2.3.2 Last1Min 1.3.6.1.4.1.8005.100.3.2.10.1 Last5Min 1.3.6.1.4.1.8005.100.3.2.10.2 Last15Min 1.3.6.1.4.1.8005.100.3.2.10.3 Last1Min 1.3.6.1.4.1.8005.100.3.2.11.1 Last5Min 1.3.6.1.4.1.8005.100.3.2.11.2 Last15Min 1.3.6.1.4.1.8005.100.3.2.11.3 GlueHostArchitectureSMPSize 1.3.6.1.4.1.8005.100.3.2.4.2 GlueHostProcessorClockSpeed 1.3.6.1.4.1.8005.100.3.2.5.4 GlueHostNetworkAdapterName 1.3.6.1.4.1.8005.100.3.2.9.1 GlueHostNetworkAdapterAddress 1.3.6.1.4.1.8005.100.3.2.9.2 GlueHostMainMemoryRAMSize 1.3.6.1.4.1.8005.100.3.2.7.1 GlueHostMainMemoryRAMAvailable 1.3.6.1.4.1.8005.100.3.2.7.2 Table 1: GLUE-CE schema mapping for Ganglia metrics objectclass (OID) GlueHost 1.3.6.1.4.1.8005.100.3.1.3 GlueHost 1.3.6.1.4.1.8005.100.3.1.3 GlueHostArchitecture 1.3.6.1.4.1.8005.100.3.1.4 GlueHostProcessor 1.3.6.1.4.1.8005.100.3.1.5 GlueHostNetworkAdapter 1.3.6.1.4.1.8005.100.3.1.9 GlueHostNetworkAdapter 1.3.6.1.4.1.8005.100.3.1.9 GlueHostMainMemory 1.3.6.1.4.1.8005.100.3.1.7 GlueHostMainMemory 1.3.6.1.4.1.8005.100.3.1.7

Information Provider Output A user-created information provider must generate LDIF objects as its output. LDIF is a file format suitable for describing directory information or modifications made to directory information. LDIF is typically used to import and export directory information between LDAP-based directory servers, or to describe a set of changes that are to be applied to a directory. So in order to publish in on the MDS the XML data needs to be transformed to LDIF objects. 3.1.2 Enable the Provider The grid-info-resource-ldif.conf file lists active objects, which show how to invoke an information provider. The created objects need to be added to grid-info-resource-ldif.conf. The GRIS back end s grid-info-resource-ldif.conf to get the path name (path: and base: parameters) and the arguments (args: parameter) of the information provider. Then the GRIS back end forks and execs the information provider. 3.2 MDS Browser Presentation Layer In order to represent the hierarchical data a Java based directory browser was developed. Since a web interface is essential for resource monitoring an applet was programmed. Such a system is not only easy to visualize but also simple to browse and effectively represents the hierarchy of resources in a grid. GIIS (GLUE schema) information index - MDS LDAP query web interface First discovery phase Second discovery phase LDAP query monitoring server - ntuds gmetad GRIS (GLUE schema) information provider LDIF output write Ganglia monitoring agent (gmond) Ganglia monitoring agent (gmond) cluster head node sensor sensor /proc filesystem /proc filesystem cluster worker node cluster worker node Figure 5: Functional Overview of the system 4 Implementation In order to implement such a system, it required a study of not only the design of the Ganglia Cluster Toolkit, but also the MDS and LDAP specifications. 4.1 Information Provider I made use of the open source C Ganglia Library to implement the information provider. The first thing to do was to open a TCP connection to the gmond (default port: 8649) and get the XML data (see Figure 6). In order to parse this XML I made use of the Expat parser, a very powerful and fast stream parser [5]. The parsed data was then stored in a hash table, which

could be searched easily for entries. After the data had been stored in the internal data structure, it needed to be converted to LDIF. This LDIF was required to pertain to the GLUE-CE schema and once the information provider was enabled it could publish data on the MDS. Since MDS serves as a common framework for information services this data can be utilized by any of the processes for scheduling, fault tolerance etc. 4.2 MDS Browser The simple MDS browser was designed using the Netscape Directory SDK[7]. The idea was to query the MDS using the SDK and displaying it in a suitable format using Java Swing Classes. In order to achieve good network performance and fast response times the applet was carefully designed. It was made sure that the query for the LDAP data was executed only when a user clicked on one of the nodes of the tree. The whole tree data was not populated at the initial load time itself. Initially, there were some problems with network connectivity and applets. One of the problems faced during the design of the applet was that it needed network connectivity. Typically an applet cannot have access to the network due to security considerations imposed by the Java sandbox security framework. In order to overcome this problem I had to create a digital certificate for the applet. This was done using the jarsigner and the keytool, which are a part of the J2SDK. Gmond at port 8649 XML over TCP Parse the XML Expat Parser Information Provider Hash Table LDIF Formatter LDIF MDS Browser (Java Applet) Publish the information Figure 6: Implementaion Overview of the system MDS Enable the information provider 5 Future Works Next-generation information architecture would use Open Grid Services Architecture mechanisms. With the launch of Globus Toolkit 3 (GTK3) integrated monitoring & discovery architecture will be interesting to explore. Information services in the future will be mostly XML-based and the emphasis will shift from host-level concept to service-level concept. I intend to study the information model in greater detail with special emphasis on web services and standardization of protocols. Another possible extension could be the graphical representation of historical monitoring data. MapCenter, developed by DataGrid represents an interesting approach to implementation of such a system [7]. All said and done, one thing is certain- if grid technologies have to be applicable in today s business world, they will need to be fault tolerant and robust. I believe effective resource monitoring and fault reporting systems can go a long way to solve many of the problems that present grid applications suffer from. References [1] I. Foster, C. Kesselman and S. Tuecke. The Anatomy of the Grid. Intl J. Supercomputer Applications, 2001. [2] Brent N. Chun, David E. Culler, Matthew L. Massie. The Ganglia Distributed Monitoring System: Design, Implementation, and Experience. [3] MDS 2.2 Object IDs (http://www.globus.org/ mds/oid.html) [4] GLUE-CE Schema DataTAG WP4 (http://www.cnaf.infn.it/~sergio/datatag/glue/index.htm) [5] Expat Parser (http://expat.sourceforge.net/ ) [6] Netscape Directory SDK (http://developer.netscape.com/tech/directory/downloads.html) [7] MapCenter (http://ccwp7.in2p3.fr/mapcenter/ )