Performance Analysis of Applying Replica Selection Technology for Data Grid Environments*
|
|
- Calvin Washington
- 5 years ago
- Views:
Transcription
1 Performance Analysis of Applying Replica Selection Technology for Data Grid Environments* Chao-Tung Yang 1,, Chun-Hsiang Chen 1, Kuan-Ching Li 2, and Ching-Hsien Hsu 3 1 High-Performance Computing Laboratory, Department of Computer Science and Information Engineering, Tunghai University, Taichung 40704, Taiwan ctyang@mail.thu.edu.tw 2 Parallel and Distributed Processing Center, Department of Computer Science and Information Management, Providence University, Taichung 43301, Taiwan kuancli@pu.edu.tw 3 Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu 300, Taiwan chh@chu.edu.tw Abstract. The Data Grid enables the sharing, selection, and connection of a wide variety of geographically distributed computational and storage resources for solving large-scale data intensive scientific applications. Such technology efficiently manage and transfer terabytes or even petabytes of data for dataintensive, high-performance computing applications in wide-area, distributed computing environments. Replica selection process allows an application to choose a replica from replica catalog, based on its performance and data access features. In this paper, we build a Grid environment based on three existing PC Cluster environments and perform performance analysis of data transfers using GridFTP protocol over these systems. In addition, based on experimental results, it is proposed a cost model to pick the best replica, in real and dynamic network situations. Keywords: Grid computing, Data Grid, Replica selection, Globus, GridFTP. 1 Introduction Grid computing is utilization of many computers resources in a network to a single problem at the same time - usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data. A Grid computing environment provides a platform for scientific applications and physical experiments. A Grid is a large-scale virtual organization which resources are shared in order to solve problems [4, 7, 9, 10, 11 12]. Grid computing is distributed computing taken to the next evolutionary level. The goal is to create the vision of * This paper is supported in part by NSC Taiwan (National Science Council), under grants no. NSC E , NSC M , NSC M and NSC E The corresponding author. V. Malyshkin (Ed.): PaCT 2005, LNCS 3606, pp , Springer-Verlag Berlin Heidelberg 2005
2 Performance Analysis of Applying Replica Selection Technology 279 large and powerful self-managing virtual computer, which is a huge collection of connected heterogeneous systems. The emerging mechanism is resources sharing through the availability of high bandwidth network. The computational Grid is a term used to provider the users a better performance, especially in terms of speed and throughput. The term Data Grid aggregate distributed resources to produce results for large size problems. Most of these Data Grid applications are executed simultaneously and access a large number of shared data files in Grid. In certain data intensive scientific applications, such as high-energy physics, bioinformatics applications and astrophysical virtual observatory, we confront with huge amount of data. A Data Grid provides two essential basic services, which are a secure, reliable, efficient data transport protocol and replica management [2]. The high-speed transport protocol, GridFTP, extends the popular FTP protocol with some new features required for Data Grid applications, such as partial file transfer and third-party transfer [5]. The replica management service take advantage of replica catalog with GridFTP transfer to provide for the creation, registration, location and management of data replicas [1]. In this paper, we build a Grid environment based on three existing PC Cluster environments and perform performance analysis of data transfers using GridFTP protocol over these systems. In addition, based on experimental results, it is proposed a cost model to pick the best replica, in real and dynamic network situations. In this paper, we propose a cost model according to the three significant parameters: network bandwidth, CPU load and I/O state. Although the network situation is constantly changing and the storage equipments are busy or idle, we can use our cost model to determine the best replica immediately. The replica selection can be conducted accurately because our cost model is based on the system monitoring information that update continuously. 2 Background Review 2.1 Globus Toolkit The Globus Project [10, 11, 12] provides software tools that make it easier to build computational Grids and Grid-based applications. These tools are collectively called The Globus Toolkit. The Globus Toolkit is used by many organizations to build computational Grids that can support their applications. The composition of the Globus Toolkit can be pictured as three pillars: Resource Management, Information Services, and Data Management. Each pillar represents a primary component of the Globus Toolkit and makes use of a common foundation of security. GRAM implements a resource management protocol, MDS implements an information services protocol, and GridFTP implements a data transfer protocol. They all use the GSI security protocol at the connection layer [8, 11, 12, 13]. 2.2 NWS The Network Weather Service (NWS) [16] is a generalized and distributed monitoring system for producing short-term performance forecasts based on historical performance measurements. The goal of the system is to dynamically characterize and
3 280 C.-T. Yang et al. forecast the performance deliverable at the application level from a set of network and computational resources. It is composed of three component processes: nws_nameserver: implements a naming and discovery service used to manage a system of nws_sensor and nws_memory, nws_memory: provides persistent storage for the measurement data collected by the NWS deployment, nws_sensor: gathers performance measurements from a specified resource and communicates it to a set of nws_memory specified on the command line. A typical installation would involve one nws_nameserver, one or more nws_memory (which may reside on different machines), and a nws_sensor running on each machine for which resources are to be monitored. The system includes sensors for end-to-end TCP/IP performance (bandwidth and latency), available CPU percentage, and available non-paged memory. 2.3 Sysstat Utilities The Sysstat [15] utilities are a collection of performance monitoring tools for Linux OS, which sysstat package contains the sar, mpstat, and iostat commands. The sar command collects and reports system activity information. This information can also be saved in a system activity file for future inspection. The iostat command reports CPU statistics and I/O statistics for tty devices and disks. The statistics reported by sar concern I/O transfer rates, paging activity, process-related activities, interrupts, network activity, memory and swap space utilization, CPU utilization, kernel activities, and tty statistics, among others. Both uniprocessor (UP) and Symmetric multiprocessor (SMP) machines are fully supported. 3 Replica Selection 3.1 Replica Selection Scenario The system established in this research used the following architecture. Figure 1 shows our proposed replica selection model, to show how a client identifies the best location for a desired replica transfer. At first, the client login at the site local site and execute parallel applications in the Data Grid platform. This application checks the files are located in local site or not. If they are present at the local site, the application accesses them immediately. Otherwise, the application passes the logical file names to replica catalog server, which returns a list of physical locations for all registered copies. The application passes this list of replica locations to a replica selection server, which identifies the destination locations of storage system for all candidate data transfer operations. The replica selection server sends the possible destination locations to information server, which provides the performance of measurements and predictions of three system factors, as described in next section. According to these estimates, the replica selection server chooses the best replica location and returns location information to the parallel application, which receives the replica through GridFTP. Once finished the application s computation, the application returns the results to user.
4 Performance Analysis of Applying Replica Selection Technology System Factors Fig. 1. Replica selection scenario We propose a replication selection model for Data Grid environments. In this environment, we can treat a biological database as a replica of Data Grid. When we execute large-scale data intensive applications in these environments, a site has both data stores and computational capabilities. To determine the best database from many of same replications is a significant problem. In our model, we consider three system factors that affect the replica selection: Network bandwidth: Network bandwidth is one of the most significant factors in Data Grid, since the size of a data file in Data Grid environment is usually very large. In other words, the data file transfer time is tightly dependent on network bandwidth situations. Because network bandwidth is unstable and dynamic factor, we should often measure and predict it as most accurate as possible. NWS (Network Weather Service) is a powerful toolkit for such purpose, CPU load: a Grid platform consists of a number of heterogeneous systems, built with different system architectures, e.g., cluster platforms, supercomputers, PCs. CPU load is a dynamic system factor, and if the CPU load of a system is heavy, it will certainly affect the data file download process from this site. The measurement of CPU status is done through the Globus Toolkit / MDS, I/O state: Data Grid nodes consist of different heterogeneous storage systems. The size of data in Data Grid is huge. If I/O state of the site that we would like to download file from is very busy, it will directly affect the data transfer performance. We measure the I/O state using sysstat utilities. 3.3 Replica Selection Cost Model The target function of a cost model for distributed and replicated data storage is the score of information from information service. We listed different influencing factors
5 282 C.-T. Yang et al. for our cost model in the previous section. However, we have to express these factors within a mathematical notation for further analysis. We assume node I is the local site which the user or application is logged in, while node j possesses the replica which the user or application wanted. The seven system parameters in our replica selection cost model are: Scorei : The score high or low represents the user or application acquiring the j replica effectively or not is from node I to node j, BW Pi : The percentage of bandwidth from node I to node j. In other words, the j current bandwidth divided the highest theoretical bandwidth, BW W : The weight of the network bandwidth defined by the administrator of the Data Grid, CPU P : The percentage of CPU idles of node j, j CPU W : The weight of the CPU load defined by the administrator of the Data Grid, I O P / : The percentage of I/O idles of node j, j I O W / : The weight of the I/O state defined by the administrator of the Data Grid, According to the given three system factors, we define the following general formula as: BW BW CPU CPU I / O I / O Scorei j = Pi j W + Pj W + Pj W (1) BW CPU I O In this formula, three influencing factors: W, W, and W /, described as the weights of network bandwidth, CPU, and I/O. These weights can be determined by the administrator of the Data Grid organization. According to different attributes of storage systems in Data Grid nodes, administrator can decide for different weights, because some storage equipment does not affect CPU load. After several experimental measurements, we consider that network bandwidth is the most significant factor, influencing directly the data transfer time. When we perform data transfer using GridFTP protocol, we discover that the CPU and I/O statuses slightly affect the performance of data transfer. In our Data Grid environment, we define the values as 80%, 10%, and 10%, respectively. 4 Experimental Environments and Results In this section, there are experimental results using GridFTP protocol. First, we measure and compare the FTP with GridFTP, as their file transfer time. Secondly, we focused in the parallel data transfer in this paper, measuring and comparing the GridFTP with 1, 2, 4, 8 and 16 TCP streams of file transfer time. The Data Grid testbed consisting of three Linux PC clusters is built as: THU site: four PCs with dual AMD AthlonMP 2.0GHz processors, 1GB DDR memory, 60GB HD, 1Gbps network bandwidth, Li-Zen site: four PCs with Intel Celeron 900MHz processor, 256MB DDR memory, 10GB HD, 30 Mbps network bandwidth, HIT site: four PCs with Intel P4 2.8GHz processors, 512MB DDR memory, 80GB HD, 1Gbps network bandwidth.
6 Performance Analysis of Applying Replica Selection Technology 283 Figure 2 shows the hardware and network configuration of our Data Grid testbed. The THU site is located in Tunghai University, Taichung City; Li-Zen site is located at Li-Zen High School, Taichung County, while HIT site is located in Hsiuping Institute of Technology, Taichung County, all in Taiwan. 4.1 FTP Versus GridFTP Fig. 2. Our Data Grid testbed The Globus Project surveyed available protocols and technologies, implemented some prototypes, and settled on using FTP and its existing extensions as a base, and then extending it again to add missing required functionality. The Globus alliance propose a common data transfer and access protocol named GridFTP that provides secure, efficient data movement in Grid environments. This protocol, which extends the standard FTP protocol, provides a superset of the features offered by the various Grid storage systems currently in use. In Grid environments, access to distributed data is typically as important as access to distributed computational resources. Distributed scientific and engineering applications require transfers of large amounts of data between storage systems, and access to large amounts of data by many geographically distributed applications and users for analyzing and visualization. We note that GridFTP protocol is extended from FTP protocol, and suitable for Grid environments. Figure 3 shows the performance of FTP and GridFTP by transferring four different file sizes. We transferred these files (256, 512, 1024 and 2048 megabytes) from THU site alpha01 to HIT site gridhit3 in our first experiment. 4.2 GridFTP with Parallel Data Transfer Using multiple TCP streams can improve aggregate bandwidth over using a single TCP stream in WAN environments. We apply this feature of GridFTP protocol to transfer different sizes files in Data Grid environments. GridFTP (as well as normal FTP) defines multiple wire protocols, or MODES, for the data channel. Most normal
7 284 C.-T. Yang et al. FTP servers only implement stream mode, i.e., the bytes flow in order over a single TCP connection. GridFTP defaults to this mode so that it is compatible with normal FTP servers. FTP versus GridFTP File Transfer Time (sec) FTP GridFTP File Sizes (MB) Fig. 3. FTP versus GridFTP However, GridFTP has another mode, called Extended Block Mode, or MODE E. This mode sends the data over the data channel in blocks. Each block consists of 8 bits of flags, a 64 bit integer indicating the offset from the start of the transfer, and a 64 bit integer indicating the length of the block in bytes, followed by a payload of length bytes. Because the offset and length are provided, out of order arrival is acceptable, i.e., the 10 th block could arrive before the 9 th because you know explicitly where it belongs. This allows us to use multiple TCP channels. If you use the parallelism option, globus-url-copy automatically puts the servers into MODE E. Note that parallel data transfer with one TCP stream is not the same as no parallel data transfer at all. Both will use a single stream, but the default will use stream mode and the parallel data transfer with one TCP stream will use mode E [12]. GridFTP with Parallel Data Transfer File Transfer Time (sec) GridFTP with no Parallel Data Transfer GridFTP with 1 TCP Stream GridFTP with 2 TCP Streams GridFTP with 4 TCP Streams GridFTP with 8 TCP Streams GridFTP with 16 TCP Streams File Sizes (MB) Fig. 4. GridFTP with parallel data transfer The parallelism option is used by the source data note to control how many parallel data connections may be established to each destination data node. Figure 4 shows the
8 Performance Analysis of Applying Replica Selection Technology 285 performance of GridFTP transferring 256, 512, 1024 and 2048 megabytes files with 1, 2, 4, 8 and 16 TCP streams from THU site alpha02 to Li-Zen site lz04. According to the experiment result, we observed that parallel data transfer technique showed better performance for larger file sizes. Parallel data transfer really improves aggregate bandwidth, with the establishment of multiple data channels. 4.3 Replica Selection Cost Model According to the replica selection scenario in 3.1, a user logins the local site THU site alpha1, and specifies the characteristics of the desired data and passes this attribute description to replica catalog server. The replica catalog server queries its database and produces a list of logical files that contain data with the specified characteristics. The replica catalog server returns the information of physical locations for all registered replicas of the desired logical files. In this experiment, there is only one logical file, file-a, conform to user s request, and the size of file-a is 1024 megabytes. Table 1. The value of replica selection cost model and file transfer time alpha1 Alpha4 hit0 lz02 BW P i j CPU P j I O P / j Replica Selection Cost model Practical Data transfer time (a) (b) Fig. 5. GUI of replica selection cost model program
9 286 C.-T. Yang et al. Next, the user passes this list of replica locations to the replica selection server, which identifies the destination storage system locations for all candidate data transfer operations. There are three replicas mapping to the logical file file-a. These three replicas are individually located at different sites, alpha4, hit0, and lz02. The replica selection server sends the candidate destination locations to the information server [17], which provide the three system factors mentioned in 3.2. Based on the replica cost model referred in 3.3, the replica selection server chooses the best replica and transfers it to the local site alpha1 by GridFTP. Table 1 shows the values of system factors and the scores of the replica selection cost model, and the physical file transfer time. According to discussions given in 3.3, we implemented a replica selection cost model computer program. We also executed the program in our Data Grid testbed. Because the program is developed using Java programming language, we can execute it in any computing platform with JVM. Fig. 5(a) shows costs that are calculated based on the three system factors (the percentage of CPU idle, I/O idle and bandwidth from other sites) to alpha1. Figure 5(b) displays the average value based on the selected time scale, which is adjustable on the top scroll bar. We also can get the sort list of the costs by clicking the Cost button. 5 Conclusions and Future Work In this paper, we have presented the design and implementation of two fundamental services. The GridFTP protocol was extended from FTP protocol, and it provides beneficial features. In this research paper, we focused in parallel data transfer issues. After measuring the performance of GridFTP with parallel data transfer feature, we confirm that such technology improves data transfer. After measuring the performance of FTP and GridFTP with four different file sizes, we could observe that even file size is 2 gigabytes; the data transfer time is similar. However, we measured the performance of GridFTP with 1, 2, 4, 8 and 16 TCP streams. We are sure that the parallel data transfer technology efficiently saves data transfer time. After calculating the score of replica selection cost model, we can sort a list of replicas from the most efficient replica to worst one. Therefore, our cost model can provide users or applications the best choice mechanism for replica selection. As future work, there are three investigations will be carried out from this research. First, although we have employed the parallel data transfer feature to improve the performance of data transfer, there is another striped data transfer feature that can improve aggregate bandwidth. Second, we will consider how to determine the system factors weight and refer to more system factors in the replica selection cost model. Third and last one, we will extend our Data Grid testbed for analyzing the performance of replica selection in a dynamic and larger number of sites environment. References 1. B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnal, S. Tuecke, Data Management and Transfer in High Performance Computational Grid Environments, Parallel Computing, Vol. 28 (5), pp , May 2002.
10 Performance Analysis of Applying Replica Selection Technology B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke, Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing, IEEE Mass Storage Conference, B. Allcock, S. Tuecke, I. Foster, A. Chervenak, and C. Kesselman, Protocols and Services for Distributed Data-Intensive Science, ACAT2000 Proceedings, pp , K. Czajkowski, S. Fitzgerald, I. Foster and C. Kesselman, Grid Information Services for Distributed Resource Sharing, Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE CS Press, August K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith and S. Tuecke, A Resource Management Architecture for Metacomputing Systems, Proc. IPPS/SPDP 98 Workshop on Job Scheduling Strategies for Parallel Processing, pp , R. L. De, C. Costa and S. Lifschitz, Database Allocation Strategies for Parallel BLAST Evaluation on Clusters, Proceedings of the Distributed and Parallel Databases, Vol. 13, Issue1, pp , Hingham, MA, USA, January I. Foster, The Grid: A New Infrastructure for 21st Century Science, Physics Today, 55(2):42-47, I. Foster, C. Kesselman, Globus: A Metacomputing Infrastructure Toolkit, Intl J. Supercomputer Applications, 11(2): , I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, Morgan-Kaufmann, I. Foster, C. Kesselman and S. Tuecke, The Anatomy of the Grid: Enabling Scalable Virtual Organizations, Intl J. Supercomputer Applications, 15(3), Global Grid Forum, The Globus Project, Introduction to Grid Computing with Globus, SETI@home: Search for Extraterrestrial Intelligence at home, berkeley. edu/ 15. SYSSTAT utilities home page, R. Wolski, N. Spring and J. Hayes, The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing, Journal of Future Generation Computing Systems, Vol. 15, No. 5-6, pp , October X. Zhang, J. Freschl, and J. Schopf, A Performance Study of Monitoring and Information Services for Distributed Systems, Proceedings of HPDC, August 2003.
A resource broker with an efficient network information model on grid environments
J Supercomput (2007) 40: 249 267 DOI 10.1007/s11227-006-0025-0 A resource broker with an efficient network information model on grid environments Chao-Tung Yang Po-Chi Shih Cheng-Fang Lin Sung-Yi Chen
More informationA Distributed Media Service System Based on Globus Data-Management Technologies1
A Distributed Media Service System Based on Globus Data-Management Technologies1 Xiang Yu, Shoubao Yang, and Yu Hong Dept. of Computer Science, University of Science and Technology of China, Hefei 230026,
More informationDiPerF: automated DIstributed PERformance testing Framework
DiPerF: automated DIstributed PERformance testing Framework Ioan Raicu, Catalin Dumitrescu, Matei Ripeanu, Ian Foster Distributed Systems Laboratory Computer Science Department University of Chicago Introduction
More informationUNICORE Globus: Interoperability of Grid Infrastructures
UNICORE : Interoperability of Grid Infrastructures Michael Rambadt Philipp Wieder Central Institute for Applied Mathematics (ZAM) Research Centre Juelich D 52425 Juelich, Germany Phone: +49 2461 612057
More informationA Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme
A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme Yue Zhang 1 and Yunxia Pei 2 1 Department of Math and Computer Science Center of Network, Henan Police College, Zhengzhou,
More informationA Data-Aware Resource Broker for Data Grids
A Data-Aware Resource Broker for Data Grids Huy Le, Paul Coddington, and Andrew L. Wendelborn School of Computer Science, University of Adelaide Adelaide, SA 5005, Australia {paulc,andrew}@cs.adelaide.edu.au
More informationReplica Selection in the Globus Data Grid
Replica Selection in the Globus Data Grid Sudharshan Vazhkudai 1, Steven Tuecke 2, and Ian Foster 2 1 Department of Computer and Information Science The University of Mississippi chucha@john.cs.olemiss.edu
More informationPerformance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture
Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationMulti-path based Algorithms for Data Transfer in the Grid Environment
New Generation Computing, 28(2010)129-136 Ohmsha, Ltd. and Springer Multi-path based Algorithms for Data Transfer in the Grid Environment Muzhou XIONG 1,2, Dan CHEN 2,3, Hai JIN 1 and Song WU 1 1 School
More informationMSF: A Workflow Service Infrastructure for Computational Grid Environments
MSF: A Workflow Service Infrastructure for Computational Grid Environments Seogchan Hwang 1 and Jaeyoung Choi 2 1 Supercomputing Center, Korea Institute of Science and Technology Information, 52 Eoeun-dong,
More informationA Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme
A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme Yue Zhang, Yunxia Pei To cite this version: Yue Zhang, Yunxia Pei. A Resource Discovery Algorithm in Mobile Grid Computing
More informationA Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing
A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing Sanya Tangpongprasit, Takahiro Katagiri, Hiroki Honda, Toshitsugu Yuba Graduate School of Information
More informationA Dynamic Resource Broker and Fuzzy Logic Based Scheduling Algorithm in Grid Environment
A Dynamic Resource Broker and Fuzzy Logic Based Scheduling Algorithm in Grid Environment Jiayi Zhou 1, Kun-Ming Yu 2, Chih-Hsun Chou 2, Li-An Yang 2, and Zhi-Jie Luo 2 1 Institute of Engineering Science,
More informationChapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.
Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies
More informationA Replica Location Grid Service Implementation
A Replica Location Grid Service Implementation Mary Manohar, Ann Chervenak, Ben Clifford, Carl Kesselman Information Sciences Institute, University of Southern California Marina Del Rey, CA 90292 {mmanohar,
More informationGridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content
1st HellasGrid User Forum 10-11/1/2008 GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou School of ECE
More informationTHE VEGA PERSONAL GRID: A LIGHTWEIGHT GRID ARCHITECTURE
THE VEGA PERSONAL GRID: A LIGHTWEIGHT GRID ARCHITECTURE Wei Li, Zhiwei Xu, Bingchen Li, Yili Gong Institute of Computing Technology of Chinese Academy of Sciences Beijing China, 100080 {zxu, liwei, libingchen,
More informationTHE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid
THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The
More informationGridMonitor: Integration of Large Scale Facility Fabric Monitoring with Meta Data Service in Grid Environment
GridMonitor: Integration of Large Scale Facility Fabric Monitoring with Meta Data Service in Grid Environment Rich Baker, Dantong Yu, Jason Smith, and Anthony Chan RHIC/USATLAS Computing Facility Department
More informationGrid Technologies & Applications: Architecture & Achievements
Grid Technologies & Applications: Architecture & Achievements Ian Foster Mathematics & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA Department of Computer Science, The
More informationIntroduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project
Introduction to GT3 The Globus Project Argonne National Laboratory USC Information Sciences Institute Copyright (C) 2003 University of Chicago and The University of Southern California. All Rights Reserved.
More informationAGARM: An Adaptive Grid Application and Resource Monitor Framework
AGARM: An Adaptive Grid Application and Resource Monitor Framework Wenju Zhang, Shudong Chen, Liang Zhang, Shui Yu, and Fanyuan Ma Shanghai Jiaotong University, Shanghai, P.R.China, 200030 {zwj03, chenshudong,
More informationProfiling Grid Data Transfer Protocols and Servers
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar, and Miron Livny Computer Sciences Department, University of Wisconsin-Madison 12 West Dayton Street, Madison WI 5370 {kola,kosart,miron}@cs.wisc.edu
More informationDesign and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters*
Design and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters* Chao-Tung Yang, Chun-Sheng Liao, and Ping-I Chen High-Performance Computing Laboratory Department of Computer
More informationAn Evaluation of Alternative Designs for a Grid Information Service
An Evaluation of Alternative Designs for a Grid Information Service Warren Smith, Abdul Waheed *, David Meyers, Jerry Yan Computer Sciences Corporation * MRJ Technology Solutions Directory Research L.L.C.
More informationWSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance
WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance Domenico Talia, Paolo Trunfio, and Oreste Verta DEIS, University of Calabria Via P. Bucci 41c, 87036
More informationA Simulation Model for Large Scale Distributed Systems
A Simulation Model for Large Scale Distributed Systems Ciprian M. Dobre and Valentin Cristea Politechnica University ofbucharest, Romania, e-mail. **Politechnica University ofbucharest, Romania, e-mail.
More informationA Finite State Mobile Agent Computation Model
A Finite State Mobile Agent Computation Model Yong Liu, Congfu Xu, Zhaohui Wu, Weidong Chen, and Yunhe Pan College of Computer Science, Zhejiang University Hangzhou 310027, PR China Abstract In this paper,
More informationA Performance Evaluation of WS-MDS in the Globus Toolkit
A Performance Evaluation of WS-MDS in the Globus Toolkit Ioan Raicu * Catalin Dumitrescu * Ian Foster +* * Computer Science Department The University of Chicago {iraicu,cldumitr}@cs.uchicago.edu Abstract
More informationTERAGRID 2007 CONFERENCE, MADISON, WI 1. GridFTP Pipelining
TERAGRID 2007 CONFERENCE, MADISON, WI 1 GridFTP Pipelining John Bresnahan, 1,2,3 Michael Link, 1,2 Rajkumar Kettimuthu, 1,2 Dan Fraser, 1,2 Ian Foster 1,2,3 1 Mathematics and Computer Science Division
More informationHigh Performance Computing Course Notes Grid Computing I
High Performance Computing Course Notes 2008-2009 2009 Grid Computing I Resource Demands Even as computer power, data storage, and communication continue to improve exponentially, resource capacities are
More informationDesign of Distributed Data Mining Applications on the KNOWLEDGE GRID
Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Mario Cannataro ICAR-CNR cannataro@acm.org Domenico Talia DEIS University of Calabria talia@deis.unical.it Paolo Trunfio DEIS University
More informationDynamic Data Grid Replication Strategy Based on Internet Hierarchy
Dynamic Data Grid Replication Strategy Based on Internet Hierarchy Sang-Min Park 1, Jai-Hoon Kim 1, Young-Bae Ko 2, and Won-Sik Yoon 2 1 Graduate School of Information and Communication Ajou University,
More informationGFS: A Distributed File System with Multi-source Data Access and Replication for Grid Computing
GFS: A Distributed File System with Multi-source Data Access and Replication for Grid Computing Chun-Ting Chen 1, Chun-Chen Hsu 1, 2, Jan-Jan Wu 2, and Pangfeng Liu 1, 3 1 Department of Computer Science
More informationTwo-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration
Two-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration Hojiev Sardor Qurbonboyevich Department of IT Convergence Engineering Kumoh National Institute of Technology, Daehak-ro
More informationHigh Throughput WAN Data Transfer with Hadoop-based Storage
High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San
More informationWeb-based access to the grid using. the Grid Resource Broker Portal
Web-based access to the grid using the Grid Resource Broker Portal Giovanni Aloisio, Massimo Cafaro ISUFI High Performance Computing Center Department of Innovation Engineering University of Lecce, Italy
More informationFuture Generation Computer Systems. Implementation of a medical image file accessing system in co-allocation data grids
Future Generation Computer Systems 26 (2010) 1127 1140 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs Implementation of a medical
More informationSDS: A Scalable Data Services System in Data Grid
SDS: A Scalable Data s System in Data Grid Xiaoning Peng School of Information Science & Engineering, Central South University Changsha 410083, China Department of Computer Science and Technology, Huaihua
More informationGrid Resources Search Engine based on Ontology
based on Ontology 12 E-mail: emiao_beyond@163.com Yang Li 3 E-mail: miipl606@163.com Weiguang Xu E-mail: miipl606@163.com Jiabao Wang E-mail: miipl606@163.com Lei Song E-mail: songlei@nudt.edu.cn Jiang
More informationDay 1 : August (Thursday) An overview of Globus Toolkit 2.4
An Overview of Grid Computing Workshop Day 1 : August 05 2004 (Thursday) An overview of Globus Toolkit 2.4 By CDAC Experts Contact :vcvrao@cdacindia.com; betatest@cdacindia.com URL : http://www.cs.umn.edu/~vcvrao
More informationAn Engineering Computation Oriented Visual Grid Framework
An Engineering Computation Oriented Visual Grid Framework Guiyi Wei 1,2,3, Yao Zheng 1,2, Jifa Zhang 1,2, and Guanghua Song 1,2 1 College of Computer Science, Zhejiang University, Hangzhou, 310027, P.
More informationSimulating a Finite State Mobile Agent System
Simulating a Finite State Mobile Agent System Liu Yong, Xu Congfu, Chen Yanyu, and Pan Yunhe College of Computer Science, Zhejiang University, Hangzhou 310027, P.R. China Abstract. This paper analyzes
More informationIMAGE: An approach to building standards-based enterprise Grids
IMAGE: An approach to building standards-based enterprise Grids Gabriel Mateescu 1 and Masha Sosonkina 2 1 Research Computing Support Group 2 Scalable Computing Laboratory National Research Council USDOE
More informationDDFTP: Dual-Direction FTP
In Proc. of The 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011), pp. 504-513, May 2011. DDFTP: Dual-Direction FTP Jameela Al-Jaroodi and Nader Mohamed Faculty
More informationPerformance Analysis of the Globus Toolkit Monitoring and Discovery Service, MDS2
Performance Analysis of the Globus Toolkit Monitoring and Discovery Service, MDS Xuehai Zhang Department of Computer Science University of Chicago hai@cs.uchicago.edu Jennifer M. Schopf Mathematics and
More informationEvaluating the Performance of Skeleton-Based High Level Parallel Programs
Evaluating the Performance of Skeleton-Based High Level Parallel Programs Anne Benoit, Murray Cole, Stephen Gilmore, and Jane Hillston School of Informatics, The University of Edinburgh, James Clerk Maxwell
More informationPCGrid: Integration of College s Research Computing Infrastructures Using Grid Technology *
PCGrid: Integration of College s Research Computing Infrastructures Using Grid Technology * Kuan-Ching Li 1 Chiou-Nan Chen 1, 2 Chun-Chieh Liu 1 Chia-Fu Chang 1 Chia-Wen Hsu 1 Sheng-Shiang Hung 1 Chun-Yu
More informationProfiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Motivation Scientific experiments are generating large amounts of data Education
More informationWeka4WS: a WSRF-enabled Weka Toolkit for Distributed Data Mining on Grids
Weka4WS: a WSRF-enabled Weka Toolkit for Distributed Data Mining on Grids Domenico Talia, Paolo Trunfio, Oreste Verta DEIS, University of Calabria Via P. Bucci 41c, 87036 Rende, Italy {talia,trunfio}@deis.unical.it
More informationHigh bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK
High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459
More informationData Management for Distributed Scientific Collaborations Using a Rule Engine
Data Management for Distributed Scientific Collaborations Using a Rule Engine Sara Alspaugh Department of Computer Science University of Virginia alspaugh@virginia.edu Ann Chervenak Information Sciences
More informationKnowledge Discovery Services and Tools on Grids
Knowledge Discovery Services and Tools on Grids DOMENICO TALIA DEIS University of Calabria ITALY talia@deis.unical.it Symposium ISMIS 2003, Maebashi City, Japan, Oct. 29, 2003 OUTLINE Introduction Grid
More informationAdvanced School in High Performance and GRID Computing November Introduction to Grid computing.
1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste
More informationAn Adaptive Transfer Algorithm in GDSS
An Adaptive Transfer Algorithm in GDSS Hai Jin, Xiangshan Guan, Chao Xie and Qingchun Wang Key Laboratory for Cluster and Grid Computing, School of Computer Science and Technology, Huazhong University
More informationRedundant Parallel Data Transfer Schemes for the Grid Environment
Redundant Parallel Data Transfer Schemes for the Grid Environment R.S.Bhuvaneswaran Yoshiaki Katayama Naohisa Takahashi Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya
More informationData Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices
Data Management 1 Grid data management Different sources of data Sensors Analytic equipment Measurement tools and devices Need to discover patterns in data to create information Need mechanisms to deal
More informationA RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS
A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS Raj Kumar, Vanish Talwar, Sujoy Basu Hewlett-Packard Labs 1501 Page Mill Road, MS 1181 Palo Alto, CA 94304 USA { raj.kumar,vanish.talwar,sujoy.basu}@hp.com
More informationGridSphere s Grid Portlets
COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 12(1), 89-97 (2006) GridSphere s Grid Portlets Michael Russell 1, Jason Novotny 2, Oliver Wehrens 3 1 Max-Planck-Institut für Gravitationsphysik, Albert-Einstein-Institut,
More informationAn Efficient Storage Mechanism to Distribute Disk Load in a VoD Server
An Efficient Storage Mechanism to Distribute Disk Load in a VoD Server D.N. Sujatha 1, K. Girish 1, K.R. Venugopal 1,andL.M.Patnaik 2 1 Department of Computer Science and Engineering University Visvesvaraya
More informationpyglobus: A Python interface to the Globus Toolkit
Abstract pyglobus: A Python interface to the Globus Toolkit Keith R. Jackson Lawrence Berkeley National Laboratory Developing high-performance problem solving environments/applications that allow scientists
More informationComputational Mini-Grid Research at Clemson University
Computational Mini-Grid Research at Clemson University Parallel Architecture Research Lab November 19, 2002 Project Description The concept of grid computing is becoming a more and more important one in
More informationPBS PRO: GRID COMPUTING AND SCHEDULING ATTRIBUTES
Chapter 1 PBS PRO: GRID COMPUTING AND SCHEDULING ATTRIBUTES Bill Nitzberg, Jennifer M. Schopf, and James Patton Jones Altair Grid Technologies Mathematics and Computer Science Division, Argonne National
More informationText mining on a grid environment
Data Mining X 13 Text mining on a grid environment V. G. Roncero, M. C. A. Costa & N. F. F. Ebecken COPPE/Federal University of Rio de Janeiro, Brazil Abstract The enormous amount of information stored
More informationMiddleware of Taiwan UniGrid
Middleware of Taiwan UniGrid Po-Chi Shih 1, Hsi-Min Chen 2, Yeh-Ching Chung 1, Chien-Min Wang 3, Ruay-Shiung Chang 4, Ching-Hsien Hsu 5, Kuo-Chan Huang 6, Chao-Tung Yang 7 shedoh@sslab.cs.nthu.edu.tw,
More informationGlobus Toolkit Firewall Requirements. Abstract
Globus Toolkit Firewall Requirements v0.3 8/30/2002 Von Welch Software Architect, Globus Project welch@mcs.anl.gov Abstract This document provides requirements and guidance to firewall administrators at
More informationGlobus Online and HPSS. KEK, Tsukuba Japan October 16 20, 2017 Guangwei Che
Globus Online and HPSS KEK, Tsukuba Japan October 16 20, 2017 Guangwei Che Agenda (1) What is Globus and Globus Online? How Globus Online works? Globus DSI module for HPSS Globus Online setup DSI module
More informationAn Evaluation of Object-Based Data Transfers on High Performance Networks
An Evaluation of Object-Based Data Transfers on High Performance Networks Phillip M. Dickens Department of Computer Science Illinois Institute of Technology dickens@iit.edu William Gropp Mathematics and
More informationLessons learned producing an OGSI compliant Reliable File Transfer Service
Lessons learned producing an OGSI compliant Reliable File Transfer Service William E. Allcock, Argonne National Laboratory Ravi Madduri, Argonne National Laboratory Introduction While GridFTP 1 has become
More informationTortoise vs. hare: a case for slow and steady retrieval of large files
Tortoise vs. hare: a case for slow and steady retrieval of large files Abstract Large file transfers impact system performance at all levels of a network along the data path from source to destination.
More informationFunctional Requirements for Grid Oriented Optical Networks
Functional Requirements for Grid Oriented Optical s Luca Valcarenghi Internal Workshop 4 on Photonic s and Technologies Scuola Superiore Sant Anna Pisa June 3-4, 2003 1 Motivations Grid networking connection
More informationDYNAMO DirectorY, Net Archiver and MOver
DYNAMO DirectorY, Net Archiver and MOver Mark Silberstein, Michael Factor, and Dean Lorenz IBM Haifa Research Laboratories {marks,factor,dean}@il.ibm.com Abstract. The Grid communities efforts on managing
More informationA GridFTP Transport Driver for Globus XIO
A GridFTP Transport Driver for Globus XIO Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Joseph Link 5, and John Bresnahan 1,2,3 1 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne,
More informationROCI 2: A Programming Platform for Distributed Robots based on Microsoft s.net Framework
ROCI 2: A Programming Platform for Distributed Robots based on Microsoft s.net Framework Vito Sabella, Camillo J. Taylor, Scott Currie GRASP Laboratory University of Pennsylvania Philadelphia PA, 19104
More informationCluster Abstraction: towards Uniform Resource Description and Access in Multicluster Grid
Cluster Abstraction: towards Uniform Resource Description and Access in Multicluster Grid Maoyuan Xie, Zhifeng Yun, Zhou Lei, Gabrielle Allen Center for Computation & Technology, Louisiana State University,
More informationA Federated Grid Environment with Replication Services
A Federated Grid Environment with Replication Services Vivek Khurana, Max Berger & Michael Sobolewski SORCER Research Group, Texas Tech University Grids can be classified as computational grids, access
More informationA Survey Paper on Grid Information Systems
B 534 DISTRIBUTED SYSTEMS A Survey Paper on Grid Information Systems Anand Hegde 800 North Smith Road Bloomington Indiana 47408 aghegde@indiana.edu ABSTRACT Grid computing combines computers from various
More informationGrid Computing Systems: A Survey and Taxonomy
Grid Computing Systems: A Survey and Taxonomy Material for this lecture from: A Survey and Taxonomy of Resource Management Systems for Grid Computing Systems, K. Krauter, R. Buyya, M. Maheswaran, CS Technical
More informationDDMG : A Data Dissemination Mechanism for Grid Environments
IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.9A, September 2006 109 DDMG : A Data Dissemination Mechanism for Grid Environments Hyung Jinn Kim University of Science and
More informationGrid Architectural Models
Grid Architectural Models Computational Grids - A computational Grid aggregates the processing power from a distributed collection of systems - This type of Grid is primarily composed of low powered computers
More informationGRID COMPUTING BASED MODEL FOR REMOTE MONITORING OF ENERGY FLOW AND PREDICTION OF HT LINE LOSS IN POWER DISTRIBUTION SYSTEM
GRID COMPUTING BASED MODEL FOR REMOTE MONITORING OF ENERGY FLOW AND PREDICTION OF HT LINE LOSS IN POWER DISTRIBUTION SYSTEM 1 C.Senthamarai, 2 A.Krishnan 1 Assistant Professor., Department of MCA, K.S.Rangasamy
More informationDynamic Provisioning of a Parallel Workflow Management System
2008 International Symposium on Parallel and Distributed Processing with Applications Dynamic Provisioning of a Parallel Workflow Management System Ching-Hong Tsai Department of Computer Science, National
More informationParallelizing Inline Data Reduction Operations for Primary Storage Systems
Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr
More informationGrid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms
Grid Computing 1 Resource sharing Elements of Grid Computing - Computers, data, storage, sensors, networks, - Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem
More informationAn Introduction to the Grid
1 An Introduction to the Grid 1.1 INTRODUCTION The Grid concepts and technologies are all very new, first expressed by Foster and Kesselman in 1998 [1]. Before this, efforts to orchestrate wide-area distributed
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationCS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:
CS550 Advanced Operating Systems (Distributed Operating Systems) Instructor: Xian-He Sun Email: sun@iit.edu, Phone: (312) 567-5260 Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment
More informationA Capabilities Based Communication Model for High-Performance Distributed Applications: The Open HPC++ Approach
A Capabilities Based Communication Model for High-Performance Distributed Applications: The Open HPC++ Approach Shridhar Diwan, Dennis Gannon Department of Computer Science Indiana University Bloomington,
More informationA Comparison of Conventional Distributed Computing Environments and Computational Grids
A Comparison of Conventional Distributed Computing Environments and Computational Grids Zsolt Németh 1, Vaidy Sunderam 2 1 MTA SZTAKI, Computer and Automation Research Institute, Hungarian Academy of Sciences,
More informationXtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin
XtreemFS a case for object-based storage in Grid data management Jan Stender, Zuse Institute Berlin In this talk... Traditional Grid Data Management Object-based file systems XtreemFS Grid use cases for
More informationQoS-constrained List Scheduling Heuristics for Parallel Applications on Grids
16th Euromicro Conference on Parallel, Distributed and Network-Based Processing QoS-constrained List Scheduling Heuristics for Parallel Applications on Grids Ranieri Baraglia, Renato Ferrini, Nicola Tonellotto
More informationA Grid-Enabled Component Container for CORBA Lightweight Components
A Grid-Enabled Component Container for CORBA Lightweight Components Diego Sevilla 1, José M. García 1, Antonio F. Gómez 2 1 Department of Computer Engineering 2 Department of Information and Communications
More informationThe glite File Transfer Service
The glite File Transfer Service Peter Kunszt Paolo Badino Ricardo Brito da Rocha James Casey Ákos Frohner Gavin McCance CERN, IT Department 1211 Geneva 23, Switzerland Abstract Transferring data reliably
More informationA Parallel Programming Environment on Grid
A Parallel Programming Environment on Grid Weiqin Tong, Jingbo Ding, and Lizhi Cai School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China wqtong@mail.shu.edu.cn Abstract.
More informationAutomating large file transfers
Automating large file transfers Adam H. Villa 1 and Elizabeth Varki 1 1 Department of Computer Science, University of New Hampshire, Durham, NH, USA Abstract The amount of data being transferred by the
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationCommunity Software Development with the Astrophysics Simulation Collaboratory
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2001; volume (number): 000 000 Community Software Development with the Astrophysics Simulation Collaboratory 5
More informationGlobus XIO Compression Driver: Enabling On-the-fly Compression in GridFTP
Globus XIO Compression Driver: Enabling On-the-fly Compression in GridFTP Mattias Lidman, John Bresnahan, Rajkumar Kettimuthu,3 Computation Institute, University of Chicago, Chicago, IL Redhat Inc. 3 Mathematics
More informationReasons not to Parallelize TCP Connections for Fast Long-Distance Networks
Reasons not to Parallelize TCP Connections for Fast Long-Distance Networks Zongsheng Zhang Go Hasegawa Masayuki Murata Osaka University Contents Introduction Analysis of parallel TCP mechanism Numerical
More information