A Distributed Media Service System Based on Globus Data-Management Technologies1

Similar documents
Grid Computing Security: A Survey

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments*

Multi-path based Algorithms for Data Transfer in the Grid Environment

Grid Architectural Models

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

UNICORE Globus: Interoperability of Grid Infrastructures

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

Knowledge Discovery Services and Tools on Grids

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Globus Toolkit Firewall Requirements. Abstract

Layered Architecture

Architecture Proposal

UNIT IV PROGRAMMING MODEL. Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus

A Survey Paper on Grid Information Systems

High Performance Computing Course Notes Grid Computing I

GlobalStat: A Statistics Service for Diverse Data Collaboration. and Integration in Grid

Web-based access to the grid using. the Grid Resource Broker Portal

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Replica Selection in the Globus Data Grid

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

A SEMANTIC MATCHMAKER SERVICE ON THE GRID

glite Grid Services Overview

An Adaptive Transfer Algorithm in GDSS

Grid Computing Systems: A Survey and Taxonomy

Grid Technologies & Applications: Architecture & Achievements

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

A Replication and Cache based Distributed Metadata Management System for Data Grid

Scientific Computing with UNICORE

Grid Middleware and Globus Toolkit Architecture

An Evaluation of Alternative Designs for a Grid Information Service

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

SDS: A Scalable Data Services System in Data Grid

Research and Design Application Platform of Service Grid Based on WSRF

Grid Computing. Lectured by: Dr. Pham Tran Vu Faculty of Computer and Engineering HCMC University of Technology

Credentials Management for Authentication in a Grid-Based E-Learning Platform

Introduction to Grid Computing

THE VEGA PERSONAL GRID: A LIGHTWEIGHT GRID ARCHITECTURE

By Ian Foster. Zhifeng Yun

GT-OGSA Grid Service Infrastructure

igrid: a Relational Information Service A novel resource & service discovery approach

Integrating Legacy Authorization Systems into the Grid: A Case Study Leveraging AzMan and ADAM

The Community Authorization Service: Status and Future

pyglobus: A Python interface to the Globus Toolkit

An Engineering Computation Oriented Visual Grid Framework

Design and Implementation of unified Identity Authentication System Based on LDAP in Digital Campus

Usage of LDAP in Globus

A Guanxi Shibboleth based Security Infrastructure for e-social Science

Research on the Key Technologies of Geospatial Information Grid Service Workflow System

WebFlow - High-Level Programming Environment and Visual Authoring Toolkit for High Performance Distributed Computing

A AAAA Model to Support Science Gateways with Community Accounts

Database Assessment for PDMS

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

The Earth System Grid: A Visualisation Solution. Gary Strand

Inca as Monitoring. Kavin Kumar Palanisamy Indiana University Bloomington

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

ROCI 2: A Programming Platform for Distributed Robots based on Microsoft s.net Framework

Boundary control : Access Controls: An access control mechanism processes users request for resources in three steps: Identification:

An authorization Framework for Grid Security using GT4

A Federated Grid Environment with Replication Services

Grid Data Management in Action: Experience in Running and Supporting Data Management Services in the EU DataGrid Project

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Lessons learned producing an OGSI compliant Reliable File Transfer Service

A Multipolicy Authorization Framework for Grid Security

Juliusz Pukacki OGF25 - Grid technologies in e-health Catania, 2-6 March 2009

A CORBA Commodity Grid Kit

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A Simulation Model for Large Scale Distributed Systems

Grid Computing: Status and Perspectives. Alexander Reinefeld Florian Schintke. Outline MOTIVATION TWO TYPICAL APPLICATION DOMAINS

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

TERAGRID 2007 CONFERENCE, MADISON, WI 1. GridFTP Pipelining

Research on the Interoperability Architecture of the Digital Library Grid

The Research and Design of the Application Domain Building Based on GridGIS

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

Text mining on a grid environment

Grid-Based Data Mining and the KNOWLEDGE GRID Framework

The Grid Authentication System for Mobile Grid Environment

Personal Grid Running at the Edge of Internet *

The glite File Transfer Service

Progress in building the International Lattice Data Grid

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

AN INTRODUCTION TO THE GLOBUS TOOLKIT

The Integration of Grid Technology with OGC Web Services (OWS) in NWGISS for NASA EOS Data

A resource broker with an efficient network information model on grid environments

MONITORING OF GRID RESOURCES

Introduction to SRM. Riccardo Zappi 1

A Novel Adaptive Proxy Certificates Management Scheme in Military Grid Environment*

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

GRAIL Grid Access and Instrumentation Tool

IMAGE: An approach to building standards-based enterprise Grids

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

A 3-tier Grid Architecture and Interactive Applications Framework for Community Grids

GridARM: Askalon s Grid Resource Management System

Customized way of Resource Discovery in a Campus Grid

Design The way components fit together

A Gentle Introduction to Globus Otto Sievert May 26, 1999 Parallel Computation, CSE160/CSE260, Spring 1999 Abstract As computers become better connect

VMware View Upgrade Guide

Community Software Development with the Astrophysics Simulation Collaboratory

Managing MPICH-G2 Jobs with WebCom-G

DataMining with Grid Computing Concepts

Transcription:

A Distributed Media Service System Based on Globus Data-Management Technologies1 Xiang Yu, Shoubao Yang, and Yu Hong Dept. of Computer Science, University of Science and Technology of China, Hefei 230026, PR, China xyu4@mail.ustc.edu.cn, syang@ustc.edu.cn Abstract. Recently, emerging high performance applications require the ability to exploit diverse, geographically distributed resources, specifically, in a Grid environment. A collection of software, called the Globus grid toolkit, is developed to address related drawbacks. In this paper, we describe an application (we call it Distributed Media Service System, DMSS for short) developed in this toolkit: we present the features and physical infrastructure of the goal system, the effectiveness of the toolkit approach, our heuristic enhancements, and draw a conclusion regarding future work. 1 Introduction Grid technology is deemed a critical element of future high performance computing environments that will enable entirely new classes of applications. We devised a media service system, called DMSS here, with the very prevalent Globus[1] toolkit. Basically, our DMSS system harnesses data management in the Globus toolkit, which enables various operations on files, namely, registration, publication, deletion, copy, inquiry, remote access, data transfer, security protection, etc, and effectively utilizes resources, avoiding the bottleneck caused by multiple users linked together in traditional central media service systems. The organization of this paper is as follows. In the next section, we describe the physical architecture and system functionalities of our DMSS. In section 3, we introduce the major technical supports from Globus. This is followed up by section 4, where we give a detailed explanation on the five s of DMSS system. A brief conclusion and future work is presented in the remainder of this paper. 1 This paper is supported by the National Natural Science Foundation of China (Contract No. 60273041) and the National 863 High-Tech Program of China (Contract No. 2002AA104560).

2 Physical Structure and Functionalities of DMSS 2.1 Physical Structure Our DMSS system, a virtual organization, consists of a server and a host of nodes. The server s main duties are providing users with access to the DMSS system, and enabling them to publish, access and maintain data files. Each node stores some data files or offers logical management of Replica catalogs, and its location is distributed geographically. As per figure 1, host Pi (i=1,2 ) beyond the dotted lines of each area does not belong to the effective member node set of the system (or rather, the DMSS virtual organization). Take area 4 for example, nodes N1 and N2 are member nodes of the system, while hosts P1, P2 and P3 have no membership. However, P1, P2 and P3 can be members of other VOs. That is to say, if area 4 is a VO, then nodes N1 and N2 are the member nodes of not only the DMSS system, but also VO area 4, but P1, P2, and P3 are member nodes of only the VO area 4. N1 P2 N2 P1 P3 N1 P1 N2 Area 4 Area 1 N1 P1 Portal & Server Disk Disk Area 2 DMSS N1 Area 3 Disk Fig. 1. Physical structure of DMSS system Theoretically, any node can enter or quit the system randomly, which guarantees the scalability of the DMSS system. Illustrated below is the membership procedure of a host entering the DMSS system: The node first obtains an effective certificate, whose publisher is a credential of DMSS system, so as to authenticate its legal identity. Then, the node registers to the DMSS server, declaring that it wants to become a member of the VO, offering its remaining disk space or Replica catalogue service. 2.2 Functionalities The DMSS system fully utilizes remaining resources in network, via the following:

1, Replica catalogs logically manage distributed media files. There are two sorts of nodes: A node has its own Replica catalog. A node can store files only, has no Replica catalog, so that its files are managed logically by Replica catalogs of other nodes. 2, When the user connects to the server, the server calls access function, and returns to the user the nearest download location of the file in question, or else, redirects the user to a nearest node, where the user directly gets the media service. 3, System evaluates the latency and checks if it has surpassed limitation. If so, the system searches for an effective node nearest to the user and stores a copy of the file. Next time, this very user or others near him would get faster service if they access the same file. 4, System is maintained periodically, deleting redundant replicas for optimal disk utilization. 3 Basic Globus Services Used in DMSS We harness various Globus services in our implementation, specifically, MDS, Replica, GSI, GridFtp, GASS and GRAM[1-6], which are only basic services, therefore, we combine with them our original access selection algorithm plus the strategy of event-recording and redundancy maintenance. MDS (Metacomputing Directory Service)[2] integrates LDAP (Lightweight Directory Access Protocol)[3] to construct its own schema and store static and dynamic information of Grid internal entities, then establishes a uniform namespace. We offer three MDS functions: 1) a unique global name for nodes 2) registration of legal member nodes 3) obtains host name, IP address and the volume of available disk space of effective nodes. Replica Catalogs offer services like registration, publication, deletion, copy and facilitates relevant publication, mutual information access between entities, and the distribution of information resources and information services[4]. In our DMSS system, Replica serves these purposes: catalog creation, logical file management, file publication and file access. GridFtp provides secure and effective data transfer[5]. In DMSS system, GridFtp provides three services: file upload, file download and physical storage of replicas. GSI (Globus Security Infrastructure) locates resources with X.509 certificate information provided by MDS service[6] together with RSA encryption algorithm, to achieve mutual authentication between user and resources. GRAM (Globus Resource Allocation Manager) handles resource requests, executes remote applications, allocates resources and manages system actions, computes and transmits information of resource updates to MDS. In DMSS, GRAM plays two roles via commands on the server s remote node: obtain latency by ping command and delete redundant copies by rm command. GASS (Global Access to Secondary Storage) services, which interoperates with other Globus s, mainly addresses the remote I/O problem in a Grid system.

4 DMSS System Architecture---the Five Modules We assume that each node owns legal identity and credential, and is member node of the DMSS system to guarantee security for each node and its information access. As shown in figure 2, the whole system is comprised of five s: Portal, Replica generation, Replica selection, Event recording, and Replica management. Replica generation Portal Replica selection Event recording Replica management Fig. 2. Architecture of DMSS system 4.1 Portal---Streaming Media Service Entry It provides users web-interface, and implements the redirection of users to optimized nodes for media services, or returns optimized node for the accessed files. Common Grid users need a GUI (graphic user interface) in grid application, which we call portal. We devised a user interface for implementation of the system, as in figure 3. Return address or redirect user to the selected node User Return selection result Portal entry Genetate selection parameters Dynamically obtain file Selection strategy Call replica selection Event recording Fig. 3. Flowchart of Portal JSP (Java Server Pages) and Java Servlet toolkits are adopted to implement the connection of user interface and back-platform. The critical problems are:

The media files are sorted into 3 categories in our DMSS system. Assume the system has got some audio files in storage, they can be categorized as such: catalog name (rc),set name (lc): lc represents the singer s name, file name (lf): lf represents the song s name. When the user accesses a file, JSP page offers parameter description of accessed files. Then, JSP relays these parameters to Java Servlet, which calls the Replica selection. The consults the selection strategy and returns the most effective node address of accessed files. 4.2 Replica Generation Module Although Globus provides file publication and registration, namely, media file catalogs, media file sets, logical media files and the mapping of logical location to physical location of media files, we must monitor the MDS information service to obtain optimal node for Replica generation, or else choose a node (with adequate physical space for the additional files) as the mapping from the additional logical file to a specific physical address. Replica generation includes replica catalog generation, logical set generation and file publication, as shown in figure 4. Call MDS to obtain current node information Administrator 1.replica catalog 3.file generation publication Generaion type? Generate new logical file name 2.logical set generaion Select node Generate new rc Select rc Generate new lc Call MDS to obtain physical node to map logical location to physical location Record event Event recording Generate lc that belong to above rc? yes no end Add new file that belong to above lc Replica management Fig. 4. Flowchart of Replica generation 1, Replica catalog generation: System calls MDS service for current node information and selects the most suitable node for generating duplicate catalog with the specified heuristic. 2, Logical set generation: Intuitively, a logical set is generated when its parent replica catalog is in existence, then new attributes are added to files in the set. 3, File publication: We come up with two operations in the file addition: 1) generation of logical file, 2) mapping from logical address of the file to its physical address (generation of physical address). However, the Replica generation doesn t guarantee the atomicity of the two operations, so we devised an Event recording

, which records current events with four different states. An event is in the finishing state only if all its operations are completed. The Event recording addresses the atomicity problem, which is further examined in ensuing parts. Meanwhile, Replica management reads information from the Event recording, and completes the ultimate file addition job. Thus, we just need to select the physical node address of the file in question, and record its original storage address together with its logical address and its physical address in the Event recording. 4.3 Replica Selection Module It provides access to media files, selects optimal node for media files, checks the need for redundant replicas, and reports to the Event recording related information. Globus Replica catalog service simply accesses object file, however, when there are multiple copies of the file in the system, the is not capable of selecting the most effective node for file storage. The Replica selection integrates MDS services with the access-selection heuristic, optimizing its service. Query file Obtain UC of the file return return Number of UC > 1? yes Select best node that stores the file Need copy? yes no Record file information Select destin ation node Record event Event record ing Record file information Replica managem ent end no MDS Fig. 5. Flowchart of Replica selection As shown in figure 5, when the Replica selection receives an access request, it locates the file s logical address firstly, and subsequently physical address via the mapping. To speed up service, a file might have one or even multiple copies in the system, so DMSS system switches between two conditions: When the file is stored in a single node, one uc value is returned, in this case, the system directly returns access results, and the Replica selection decides whether to make redundant copies. When the file is stored in multiple nodes, multiple uc values are returned, so the problem of how to select the node which can offer fastest service, is put to the Replica selection, then, as in the first case the decision of redundant copies.

4.4 Event Recording Module The essentiality of an event recorder for the DMSS system lies in the following points. 1) Either the file publication or redundancy copy (this is virtually file publication) needs to call the Replica management. 2) Also, to maintain redundancy rate of system files and optimize disk space, we set a limit to the number of system file copies--a file can have at most one copy on each node in the system. As shown in figure 6, two tables are maintained by Event management : 1) Event recording table: records information about files to be published. State description is shown in table 1. R e p l i c a g e n e r a t i o n m o d u l e R e p l i c a s e l e c t i o n m o d u l e E v e n t r e c o r d i n g t a b l e F i l e i n f o r m a t i o n t a b l e R e p l i c a m a n a g e m e n t m o d u l e Fig. 6. Flowchart of Event recording 2) File access information table: records file access information and provides information about the node storing the file, and the number of times each node is accessed in a period. Table 1. State description state State value Description 1 00 Event not completed 2 01 Event logically copied, but not physically submitted 3 10 Event physically copied, not logically configured 4 11 Event completed, can be deleted from record table 4.5 Replica Management Module It maintains media files in the Grid, periodically updates file redundancy rate, and optimizes disk space to achieve optimal file storage. Data file maintenance jobs are: The realistic file publication. When an event is in state 00, Replica management completes the file addition. File addition consists of two steps: Firstly, physical copy; the system modifies the event state to 01 or 10; secondly, logical configuration, event concludes and its state changes to 11. Maintaining redundancy rate of data. The reads internal information and periodically deletes redundant replica files in the system, avoiding too many identical replicas and reduces redundancy.

After system starts(timestamp=0), the Replica management s first redundancy maintenance is after T=T1+T2, during which the reads information from records of time slices T1 and T2, then maintenance is executed every other T. 5 Conclusion and Future Work In this paper, we discussed the functionalities and features of a Distributed Media Service System based on Globus data-management; we provided further explanation on the design of its components. The DMSS system applies the concept of data Grid, providing data files and services in a distributed manner (ubiquitous). In Media services, files are read-only, no consistency problem. Considering the latency brought about by security measures in Globus Grid environment (latency in mutual authentication can reach 7 seconds), we deploy an anonymous file access. However, latency is inevitable. In particular, the problem that the system needs file update as well as information pre-extraction of newly published files, is expected to be addressed in future work, by employing the up-to-date techniques---actively search and retrieve and then automatically publish files on the Internet. References 1. Ian Foster, Carl Kesselman: The Globus Project: A Status Report. In Proc. Heterogeneous Computing Workshop, pages 4-18. IEEE Computer Society Press, 1998 2. MDS 2.1: Creating a Hierarchical GIIS, Available from http://www.globus.org/mds 3. Henze Johner, Larry Brown: Understanding LDAP. Available from http://www.redbooks.ibm.com 4. Byoungdai Lee, Jon B.Weissman: Dynamic Replica Management in the Service Grid. Available from http://www-users.cs.umn.edu/~jon/papers/hpdc_sg.pdf 5. W. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke: Data Management and Transfer in High-Performance Computational Grid Environments. Parallel Computing, 2001. 6. Ian Foster, Carl Kesselman, Gene Tsudik: A Security Architecture for Computational Grids 37-46 Electronic Edition( IEEE Computer Society DL), 2002.