GFS: A Distributed File System with Multi-source Data Access and Replication for Grid Computing

Similar documents
GFS: The Google File System

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments*

GFS: The Google File System. Dr. Yingwu Zhu

Distributed Filesystem

A Fast and High Throughput SQL Query System for Big Data

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

File Management. Chapter 12

A Distributed Media Service System Based on Globus Data-Management Technologies1

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

An Evaluation of Alternative Designs for a Grid Information Service

Processing Technology of Massive Human Health Data Based on Hadoop

Assignment 5. Georgia Koloniari

Distributed System. Gang Wu. Spring,2018

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

The Google File System

Correlation based File Prefetching Approach for Hadoop

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud

Google File System. Arun Sundaram Operating Systems

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

An Architecture For Computational Grids Based On Proxy Servers

an Object-Based File System for Large-Scale Federated IT Infrastructures

The Google File System

Distributed File Systems Part II. Distributed File System Implementation

Scalable Hybrid Search on Distributed Databases

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

CSE 124: Networked Services Lecture-16

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Grid Architectural Models

Distributed File Systems II

CSE 124: Networked Services Fall 2009 Lecture-19

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Database Assessment for PDMS

The Google File System (GFS)

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture

CLOUD-SCALE FILE SYSTEMS

Understanding StoRM: from introduction to internals

A Federated Grid Environment with Replication Services

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

The Google File System

A Finite State Mobile Agent Computation Model

XtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin

The Google File System

A Dynamic Resource Broker and Fuzzy Logic Based Scheduling Algorithm in Grid Environment

Middleware of Taiwan UniGrid

Cloud Computing CS

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Service and Cloud Computing Lecture 10: DFS2 Prof. George Baciu PQ838

Grid Computing with Voyager

Revisiting Join Site Selection in Distributed Database Systems

File System Implementation

Ubiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses

Google File System. By Dinesh Amatya

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

Dynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c

NUSGRID a computational grid at NUS

Hadoop and HDFS Overview. Madhu Ankam

OPERATING SYSTEM. Chapter 12: File System Implementation

GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content

An Introduction to GPFS

Design and Implementation of a Random Access File System for NVRAM

MSF: A Workflow Service Infrastructure for Computational Grid Environments

WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance

Finding a Needle in a Haystack. Facebook s Photo Storage Jack Hartner

Ethane: taking control of the enterprise

A Replica Location Grid Service Implementation

Outline. ASP 2012 Grid School

Chapter 11: Implementing File Systems

A AAAA Model to Support Science Gateways with Community Accounts

The Leading Parallel Cluster File System

Structuring PLFS for Extensibility

Dispatcher. Phoenix. Dispatcher Phoenix Enterprise White Paper Version 0.2

A Distributed System for Continuous Integration with JINI 1

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

The rcuda middleware and applications

CS2506 Quick Revision

Chapter 11: Implementing File

A User-level Secure Grid File System

Towards Access Control for Isolated Applications. SECRYPT 2016, Lisbon, Portugal

OPEN SOURCE GRID MIDDLEWARE PACKAGES

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Google Disk Farm. Early days

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

How to Apply the Geospatial Data Abstraction Library (GDAL) Properly to Parallel Geospatial Raster I/O?

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

Discover CephFS TECHNICAL REPORT SPONSORED BY. image vlastas, 123RF.com

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS3600 SYSTEMS AND NETWORKS

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

The Grid Monitor. Usage and installation manual. Oxana Smirnova

A NEW DISTRIBUTED COMPOSITE OBJECT MODEL FOR COLLABORATIVE COMPUTING

Transcription:

GFS: A Distributed File System with Multi-source Data Access and Replication for Grid Computing Chun-Ting Chen 1, Chun-Chen Hsu 1, 2, Jan-Jan Wu 2, and Pangfeng Liu 1, 3 1 Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan {r94006,d95006,pangfeng}@csie.ntu.edu.tw 2 Institute of Information Science, Academia Sinica, Taipei, Taiwan wuj@iis.sinica.edu.tw 3 Graduated Institute of Networking and Multimedia National Taiwan University, Taipei, Taiwan Abstract. In this paper, we design and implement a distributed file system with multi-source data replication ability, called Grid File System (GFS), for Unix-based grid systems. Traditional distributed file system technologies designed for local and campus area networks do not adapt well to wide area grid computing environments. Therefore, we design GFS file system that meets the needs of grid computing. With GFS, existing applications are able to access remote files without any modification, and jobs submitted in grid systems can access data transparently with GFS. GFS can be easily deployed and can be easily accessed without special accounts. Our system also provides strong security mechanisms and a multisource data transfer method to increase communication throughput. 1 Introduction Large-scale computing grids give ordinary users access to enormous computing power. Production systems such as Taiwan UniGrid [1] regularly provide CPUs to cycle-hungry researchers in a wide variety of domains. However, it is not easy to run data-intensive jobs in a computational grid. In most grid systems, a user must specify in advance the precise set of files to be used by the jobs before submitting jobs. In some cases this may not be possible because the set of files or the fragments of file to be accessed may be determined only by the program at runtime, rather than given as command line arguments. In other cases, the user may wish to delay the assignment of data items to batch jobs until the moment of execution, so as to better schedule the processing of data items. To cope with the difficulties in running data-intensive applications with runtime-dependent data requirements, we propose a distributed file system that supports Unix-like run-time file access. The distributed file system provides the same namespace and semantics as if the files are stored on a local machine. Although a number of distributed file systems have been developed in the past N. Abdennadher and D. Petcu (Eds.): GPC 2009, LNCS 5529, pp. 119 130, 2009. c Springer-Verlag Berlin Heidelberg 2009

120 C.-T. Chen et al. decade, none of them are well suited for deployment on a computational grid. Even some distributed file systems such as the Andrew File System [2] are not appropriate for use in grid computing systems because of the following reasons: 1. They cannot be deployed without intervention by the administrator at both client and server 2. They do not provide security mechanisms needed for grid computing. To address this problem, we have designed a distributed file system for cluster and grid computing, called Grid File System (GFS). GFS allows a grid user to easily deploy and harness distributed storage without any operating system kernel changes, special privileges, or attention from the system administrator at either client or server. This important property allows an end user to rapidly deploy GFS into an existing grid (or several grids simultaneously) and use the file system to access data transparently and securely from multiple sources. The rest of this paper is organized as follows. Section 2 describes the system architecture of GFS. Section 3 describes our security mechanisms for file access. Section 4 presents GFS s multi-source data transfer. Section 5 describes our implementation of GFS on Taiwan Unigrid, as well as experiment results on the improvement of communication throughput and system performance. Section 6 gives some concluding remarks. 2 Architecture of Grid File System This section describes the functionality of the components of Grid File System (GFS), and how GFS utilizes theses components to construct a grid-enabled distributed file system. The Grid File System consists of three major components a directory server, file servers, andgfs clients. The directory server manages all metadata for GFS. File servers are responsible for the underlying file transfers between sites, and a GFS client serves as an interface between a user and GFS; users manipulate and access files in the GFS via GFS clients. 2.1 Directory Server The directory server contains five services File control service that receives requests from GFS clients and relays requests to appropriate services, Host management service that manages host information, File management service that maps a physical file to a logical file and locates a logical file, Metadata service that manages metadata of files and searches a registered file, and Replica placement service that decides where to create a replica of a logical file. File Control Service. The file control service is responsible for receiving requests from GFS clients and relaying them to appropriate services of the directory server. The file control service also updates the information in the host management service. Host Management Service. The host management service maintains the available space and the status of the hosts. Each GFS host record contains the

GFS: A Distributed File System with Multi-source Data Access 121 following information: the host name, available/total disk space and the status which indicates whether a host is on-line or not. A participant host will update its host information periodically. The file control service marks the status of a host as off-line if it does not update its host information over a certain period of time. File Management Service. The file management service manages files as logical GFS files. For each logical GFS file, the file management service records the following information: the logical file number, the physical file information, the owner tag, themodifier tag and the status. The logical file number is a globally unique identifier for a logical GFS file, which is determined by the metadata service. The physical file information helps GFS clients to locate a logical file. It contains a physical file name, physical file location, physical file tag, and a physical file status. The physical file name consists of the logical file name and a version number. The physical file location is the host name of the node where the physical file is stored. The physical file tag indicates whether this physical file is the master copy or a replica. With these information the file management service allows a GFS client to register, locate, modify and delete physical files within GFS. The File Management Service also maintains the owner tag, the modifier tag, and the status of a logical file. The owner tag of a logical file is the name of the user who owns the file. We identify a GFS user by a GFS user name, which consists of the local user account name and the local host name, e.g., user@grid01. In this way, each user has a unique identity in the Grid File System. The modifier tag of a logical file records the name of the last user who has modified this file. The status of a logical file indicates whether this physical file is the latest version of the logical file. Metadata Service. The metadata service creates and manages the metadata of GFS files. For each GFS file, the metadata service records the following information: logical file path name, logical file name, file size, mode, creation time, modified time and status. The logical file path name is the global space file path. The file size is the size of the logical file. The mode follows the traditional Unix access permissions mechanism, which contains information such as the type of this file, e.g., a regular file or a directory, and the file access permission for users. The creation time and the modified time are the times at which the logical file is created and latest modified. The status indicates the availability of the logical file. Replica Service. The replica service determines where to place the replica of a logical file when it is created. The replica service may use the information provided by the host management service to decide the appropriate location. 2.2 GFS Clients and File Servers GFS Clients. The GFS client on a host serves as an interface between user programs and the Grid File System. The GFS client follows the standard Unix

122 C.-T. Chen et al. file system interface. With this interface, users are able to transparently access files in the Grid File System as if they are accessing files from a local file system. The GFS client performs host registration and the users manipulate GFS files through the GFS client. As we pointed out in Section 1, in most grid systems, users must specify in advance the precise set of files to be used by the jobs before submitting jobs, which makes it difficult to run data-intensive jobs in grid systems. Therefore, we want to deploy Grid File System with ease in existing Unix-like distributed systems, and ensure that the access to GFS files must be transparent. GFS achieves these two goals by following the standard Unix file system interface. Another important function of GFS client is to notify the directory server when a host joins GFS. When a user mounts GFS on a local host, the GFS client first communicates with the directory server. The GFS client will send the host information, such as the available space of the host and the host location, to the directory server. The GFS client then updates the local host information with the directory server periodically. File Server. The file server is responsible for reading and writing physical files at local hosts, and transferring them to/from remote hosts. Each file server is configured to store physical files in a specified local directory. The file server accesses files based on the requests from the local GFS client. The GFS client will pass the information of physical files of the logical file to the file server. The file server then looks up the specified local directory to see whether the requested physical file is available in the local host. If it is, the file server reads/writes data from/to the physical file and sends the acknowledgment back to the user through the GFS client. On the other hand, if the requested file is at remote hosts, the file server then sends requests to GFS file servers at those remote hosts that own the data, and then receives the data from those remote servers simultaneously. 2.3 A Usage Scenario We use a usage scenario to demonstrate the interaction among GFS client, GFS directory server, and GFS file server. The scenario is illustrated in Fig. 1a and Fig. 1b. We assume that the user mounts a Grid File System at the directory /gfs, and the file server is configured so as to store files in /.gfs. We also assume that a user John at the host grid01 wants to copy a file filea from the local file system to GFS. i.e., John at grid01 issues a command, cp filea /gfs/dira/filea. After receiving the command from John, the GFS client first sends a LOOKUP query to the directory server asking whether the logical file /dira/filea exists or not. Then the metadata service of the directory server processes this query. If the answer is no, the GFS client then sends a CREATE request to the metadata service to create a logical file record with the logical path set to /dira, and the logical name set to filea. Then the GFS client asks the file server to create a physical file in /.gfs/dira/filea, and writes the content of filea into /.gfs/dira/filea, as illustrated in steps 1 12 in Fig. 1a.

GFS: A Distributed File System with Multi-source Data Access 123 local host cp filea /gfs/filea cp filea /gfs/dira/filea user 1. open filea 2. open /gfs/dira/filea 8. loop(read source, write target) local host remote host 1. open filea 9. read filea VFS 11. close filea 3. open /gfs/dira/filea local storge GFS client 4. pass replica placement information GFS client GFS client file server 6. create /.gfs/dira/filea 7. open /.gfs/dira/filea File server 5. creat a replica to the remote host File server 4. send a LOOKUP request 5. send a CREAT request 13. send a REGISTER request 10. write /.gfs/dira/filea 12. close /.gfs/dira/filea 14. rename to /.gfs/dira/1196085478,filea 1. send a cteat replica request 3. reply a host name 6. send a REGISTER request directory server File control service 13.2 create physical information 4.1 lookup /dira/filea 13.3 update host information 5.1 creat /dira/filea metadata 13.1 update /dira/filea matedata File management service Metadata service Host management service directory server File control service 6.1 creat a phyical information for the replica 1.1 pass a cteat replica request File managment service Replica service 6.2 update a host information Host management service (a) Create a file at the local host. 2. get a host information (b) Place a replica to the remote host. Fig. 1. The process of creating a file in GFS After completing the creation, the GFS client sends a REGISTER request to the directory server. The metadata service updates the metadata information of filea and the file management service creates a record for the new file such as the logical file number and the physical file information, as illustrated in steps 13 14 in Fig. 1a. Finally the GFS client sends a request for creating replicas of this logical file. The purpose of replication is to enhance fault tolerance and improve performance of GFS. The replica placement service decides where to put those replicas based on the information provided by the host management service. After receiving the locations of replicas, the GFS client passes this information to the file server, which is responsible for communicating with the remote file servers and creating replicas at those remote hosts. After successful replication, the GFS client sends a request to register those replicas with the file management service, which then adds the metadata of these replicas into physical file information database as the primary replicas of the physical file, as illustrated in Fig. 1b. There are some notes about GFS naming convention. The first rule of GFS naming convention is that all GFS hosts must have the same mount point for global view /hosta 1234567890,fileA 1234567891,fileB host A /gfs filea fileb dirc filec /hostb host B 1234567890,fileA dirc 1941114355,fileC Fig. 2. The global view of GFS namespace

124 C.-T. Chen et al. GFS. This restriction is due to compatibility issues with Unix file system interface. All GFS hosts can share the same logical namespace by this convention. The root of physical files, however, can be different in different hosts. The second rule of GFS naming convention is that logical and physical files share the same directory structure as shown in Fig. 2. 3 Security We now describe the security mechanism of GFS. A user only needs to have a user account at any single host of a grid system in order to access GFS files. Therefore, a user who already has an account in a host of the grid system does not need to create new accounts on other machines in order to use GFS. This feature helps us deploy GFS to grid systems with ease since every user in a grid system can use GFS without extra effort from the system administrator. 3.1 Access Control For each file, whether a user can access it or not depends on the identity of that user. The identity of a user is the concatenation of his/her user account and the hostname of local machine. For example, john@grid01 is the identity of the user john at the host grid01. In GFS, the owner of the logical file can modify themodeofthelogicalfile. GFS follows traditional UNIX permission mechanism with minor modification. We now discuss execution permission and read/write permission separately as follows. For execution permission we consider two cases. First, if the account that the system uses to run the executable is the user of the owner tag of the executable, i.e. john@grid01 in our scenario, then the GFS client simply checks the GFS access permission of owner to determine whether it can run this executable or not. Second, if the account is not the user of the owner tag of the executable, the GFS client first checks the GFS access permission of others. If the permission is granted for others, the GFS client loads the executable for execution. If the execution permission by others is denied, then the GFS client will not load this executable file. For read/write permission, we classify access to a file into two categories from the point of view of a GFS user: Direct.The permission is determined according to the GFS owner/others permission. Indirect. The permission is determined according to the GFS group permission. This classification is motivated by the following usage scenario. We assume that John at grid01 wants to execute his program proga at two remote sites, host1 and host2, and proga will read his file file1 as the input data. These two files, proga, and file1, are all created in GFS by John.

GFS: A Distributed File System with Multi-source Data Access 125 Now John wishes that file1 can only be accessed by proga. However, when a program is executed in a grid environment, it is usually executed by a temporary account in the remote host. That is, it is the administrators of the remote hosts that decide which account to use to execute the program and that decision is usually not predictable. The decision depends on the administration policies of the remote hosts. Thus, it is not possible to have file1 accessible only to proga with the traditional UNIX permission mechanism. Our solution is based on the fact that the program and the input files have the same owner tag, i.e., john@grid01, in GFS. When a user runs an executable file, proga in our scenario, as a process, the local GFS client will record the owner tag of this executable file and the process ID (PID) of this process. Note that here we assume that each process is associated with an executable file, and the process will not invoke other executables. Now, when this process attempts to access a GFS file, the GFS client first gets the owner tag and the GFS access mode of this file from the directory server. If the user identity of this process is the owner of this GFS file, the GFS client simply checks the GFS owner permission to determine whether this process can directly access this file. Otherwise, the GFS client checks the other permission. If the permission is granted for others, this process can also directly access this file. Next, we consider the case in which the permission by others is denied. In this case, the GFS client checks whether the GFS executable of that process has the same owner tag as the GFS file. The GFS client can simply check the PID and owner tag pair recorded when the process is created. If they have the same owner tag, the GFS client checks the GFS group permission to determine whether this process can indirectly access this file. If they do not have the same owner tag, the permission is denied. 4 Multiple Source Data Transfer Mechanism In this section, we introduce the multiple source data transfer mechanism among GFS hosts. This mechanism improves both efficiency and reliability of file transfer by downloading a file from multiple GFS hosts simultaneously. The data transfer mechanism works as follows. When a GFS client receives a user request for a logical GFS file, it sends a LOOKUP request to the file management service to find out where the physical files are. The file management service then returns the replica list to the GFS client, which contains on-line replicas of the requested file. Then the GFS client passes the list to the file server at the local host. The file server first checks the list to find out whether a replica is available at the local host. If the local host is in the list, then GFS simply use the local replica. Otherwise, the file server sends a request to each of the hosts in the list to download the file simultaneously from those data sources. A GFS file is divided into blocks of equal size, and the file server requests/receives only blocks that are within the region requested by the user to/from the hosts in the replica list.

Primergy Ethernet Switch Ethernet Switch 126 C.-T. Chen et al. Note that data transfer can also improve the correctness of GFS host metadata. If a replica is not available, the GFS client will report it back to the file management service. The file management service then marks that replica as off-line, to indicate that the replica is not functioning. Note that if the downloaded fragments constitute a complete file, the GFS client will register this physical file with the file management service as a secondary replica so that other file server can download the file from this host. 5 Performance Evaluation We conduct experiments to evaluate the performance of GFS. We implemented a prototype GFS on Taiwan UniGrid system [1], a grid testbed developed among universities and academic institutes in Taiwan. The participating institutes of Taiwan UniGrid are connected by wide area network. The first set of experiments compare the performance of GFS with two file transfer approaches SCP and GridFtp [3, 4], both are widely used for data transfer among grid hosts. The second set of experiments compare the performance of job execution with/without GFS. The third set of experiments test autodock [5], a suite of automated docking tools that predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure. 5.1 Experiment Settings Fig. 3 illustrates the system configuration in our experiments. Table 1 lists the hardware parameters of the machines. Our prototype GFS implementation uses SQLite version 3.5.1 [6] and FUSE version 2.7.1 [7] without any grid middleware. SQLite is a database tool that GFS directory server management uses to keep track of metadata. FUSE is a free Unix kernel module that allows users to create their own file systems without changing the UNIX kernel code. The FUSE kernel module was officially merged directory server grid01 grid02 NTU Ethernet Switch uniblade02 uniblade03 Ethernet Switch srbn01 CHU NTHU 100Mbps 1Gbps iisgrid01~iisgrid08 IIS Fig. 3. An Illustration of the environment of our experiments. We use four sites in Taiwan UniGrid system. The directory server resides in host grid01 at National Taiwan University.

GFS: A Distributed File System with Multi-source Data Access 127 Table 1. Hardware configurations in GFS experiments Machine(s) grid01 grid02 iisgrid01 08 uniblade02,03 srbn01 CPU Intel Core2 Intel P4 Intel Xeon Intel Xeon Intel P4 1.86GHz 2.00GHz 3.4GHz 3.20GHz 3.00GHz Cache 4M 512K 2M 2M 1M RAM 2G 1G 2G 1G 1G into the mainstream Linux kernel tree in kernel version 2.6.14. We used FUSE to implement GFS as a user-level grid file system. The directory server was deployed on grid01 (a machine located at National Taiwan University), which manages all metadata in our prototype GFS. Each of the other GFS hosts runs a GFS client and a GFS file server. File servers are responsible for the underlying file transfers between GFS sites, and GFS clients are interfaces between users (programs) and GFS, i.e., users manipulate and access GFS files via GFS clients. 5.2 Experiment Results We now describe the experimental results from the three sets of experiments. The first set of experiments examine the effects of file length on the performance. The second set of experiments examine the performance of GFS file transfer. The final set of experiments examine the job success rate using GFS. Effects of File Length. In the first set of experiments we perform a number of file copies from a remote site to a local site under different environment settings. Each experiment copies 100 files with size ranging from 5MB to 1GB, on the different Fast Ethernet switches. These file sizes are common in grid computing, e.g., the autodock tasks that we tested. Although it is possible to transfer task to where the data is located, it will be more efficient to transfer data to multiple sites so that a large number of tasks can run in parallel. This is particularly useful in running multiple tasks with different parameter setting. The file transfer commands are different in SCP, GridFTP, and GFS. In GridFTP/SCP environment, one special command is invoked for each file in order to transfer data from a remote GridFTP/SSH FTP server to the local disk. On the other hand, after mounting GFS on the directory /gfs in each machine, we use the Unix copy command cp to transfer files. Each file has a master copy and a primary replica in the system. Each file is downloaded from its master copy and its primary replica simultaneously since GFS uses the multiple source data transfer mechanism to transfer files. Table 2 shows that results from the first set of experiments. For files ranging from 100M to 1G, all three methods have about the same performance since the network overhead is not significant when compared to disk I/O overhead. However, when the size of files ranges from 5M to 50M, our approach has about the same performance with the SCP approach, and is 26% 43% faster than the popular GridFTP.

128 C.-T. Chen et al. Table 2. Performance comparisons of SCP, GridFTP, and GFS. The numbers in the table are performance ratios compared to the transferring time of GFS. 5M 10M 50M 100M 500M 1G SCP 1.10 1.13 0.93 0.98 0.99 0.98 GridFTP 1.26 1.39 1.43 1.02 1.03 1.02 GFS 1 1 1 1 1 1 GFS File Transfer. The second set of experiments compare the performance of job execution with and without GFS multiple data file transfer mechanism. We run an MPI program StringCount that counts the number of occurrence of a given string in a file. The size of all input files are 1 GB. StringCount divides the input file into equal size segments and each computing machine is assigned one segment to count the occurrence of agivenstringinthatsegment. In the first setting, we use the GFS file transfer mechanism. We put the executable file and its input files into GFS and execute the string counting MPI program. GFS file servers transfer these GFS files automatically. Note that the computing machines only receive the necessary segments of the input file from multiple file replicas simultaneously. In the second setting, we do not use GFS file transfer mechanism. Instead, we follow the job submission mechanism of the Globus [3] system. Under Globus, the local machine transfers the executable and the entire input file to the computing machines before execution. Users need to specify the location of the input files and the executable file in a script file. GridFTP transfers the files according to the script. The master copies of the executable file and its input files are in the host iisgrid01 and the primary replicas are in the host grid02. For the experiments that do not use GFS file transfer, the executable file and the input files are initially stored at the host iisgrid01. The number of worker machines ranges from 2 to 10. Fig. 4a shows the experimental results. The vertical axis is the execution time and the horizontal axis is the number of worker machines. From Fig. 4a we can see that the execution time of Globus increases as the number of hosts increases. This overhead is due to transferring the entire input file under Globus between worker machines and iisgrid01, which has the input files. On the other hand, the execution time of GFS is much shorter because the worker machines only need to get the necessary segments of the input file rather than the entire file, which greatly reduces the communication cost. Although it is possible for a programmer to use GridFTP API to transfer only the necessary parts of a input file, it takes extraordinary efforts for a programmer to learn the API and to modify the existing MPI programs. Another drawback is that once the program is modified, it cannot run in grid systems that do not have GridFTP, such as a cluster system without Globus. In contrast our GFS approach does not require a user to change his program since GFS is at the file system level. Job Success Rate. The third set of experiments use autodock [5] to illustrate that GFS improves the success rate of submitted job under Taiwan Unigrid.

GFS: A Distributed File System with Multi-source Data Access 129 Elapsed time (Sec) 900 800 700 600 500 400 300 200 100 GFS Method GridFTP Method Number of completed successfully submitted tasks 120 100 80 60 40 20 GFS Method 100 tasks 0 2 4 6 8 10 Number of hosts (a) Execution time comparison under Globus and GFS. 0 0 5 10 15 20 25 30 35 40 Elapsed time (min) (b) The number of completed jobs with respect to elapsed time. Fig. 4. Fig. 4a shows results of the second set of experiments and Fig. 4b shows results of the third set of experiments When we submit a job into Taiwan UniGrid, the job may not be able to complete because jobs assigned to the same host may request input data or executable simultaneously. As a result the amount of simultaneous traffic may exceed the capacity of GridFTP at that site, and job fails to execute. In our experience, failure rate is about 18.44% when we submit 100 jobs with two GridFTP servers [8]. GFS solve this I/O bottleneck problem by bypassing GridFTP and using a more efficient mechanism to transfer data, so that job will execute successfully. In a previous paper Ho et al. [8] reported that under the current Taiwan Unigrid Globus setting, the failure rate of an autoduck task is about 52.94% to 18.44%, depending on the methods of arranging executables and input files. The main reason of this high failure rate is the I/O bottleneck due to capacity limitation of GridFTP. Consequently, Globus GRAM jobs cannot stage in the executable program properly. This problem also occurred when tasks read the input files. When we use GFS, the GRAM resource file file.rsl only specifies the executable and arguments, since the other information are implicitly implied by GFS file system. For example, the value of the executable is a local file path such as /gfs/autodock since GFS is treated as a local file system. The arguments of the executable file are specified as usual. The input data and the output data are accessible by GFS, so it is not required in file.rsl. Fig. 4b shows the results of virtual screening (a core computation of autoduck) by screening a 100 ligands database to avian influenza virus (H5N1) [9]. The job success rate is 100%, which means every task submitted completes successfully. In other words, GFS overcomes the I/O bottleneck problem while submitting multiple GRAM jobs, which cannot stage in the executable program due to the capacity limit of GridFTP. 6 Conclusion To cope with the difficulties in running data-intensive applications with unknown data requirements and potential I/O bottleneck in Grid environment, we design

130 C.-T. Chen et al. Grid File System (GFS) that provides UNIX-like API, and provides the same namespace and semantics as if the files are stored on a local machine. GFS has the following advantages. First, GFS uses standard file I/O libraries that are available in every UNIX system; therefore, applications do not need modification to access remote GFS files. Second, GFS supports partial file access and replication mechanism for fault tolerance. Third, GFS accesses remote files with a multi-source data transfer mechanism, which improves data transfer rate by 26% 43% compared with GridFTP, which in turn enhances the overall system performance. Fourth, GFS is a user space file system that do not require kernel modification; therefore, it can be easily deployed in any Unix-like environments without the help of system administrators. We also plan to integrate the authentication mechanisms such as GSI or PKI into our further release of GFS, and conduct more experiments to compare GFS with other grid-enabled distributed file systems, such as XtreemFS [10]. Acknowledgement The authors would like to acknowledge the anonymous reviewers for their valuable advises. This research is supported in part by the National Science Council, Republic of China, under Grant NSC 97-2221-E-002-128, and by Excellent Research Projects of National Taiwan University, 97R0062-06. References 1. Taiwan unigrid project, http://www.unigrid.org.tw 2. Howard,J.,Kazar,M.,Menees,S.,Nichols,D.,Satyanarayanan,M.,Sidebotham, R., West, M.: Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS) 6(1), 51 81 (1988) 3. Globus toolkit, http://www.globus.org 4. Allcock, W., Foster, I., Tuecke, S., Chervenak, A., Kesselman, C.: Protocols and services for distributed data-intensive science. Advanced computing and analysis techniques in physics research 583, 161 163 (2001) 5. Autodock docking tools, http://autodock.scripps.edu/ 6. Sqlite, http://www.sqlite.org/ 7. Filesystem in userspace fuse, http://fuse.sourceforge.net/ 8. Ho, L.-Y., Liu, P., Wang, C.-M., Wu, J.-J.: The development of a drug discovery virtual screening application on taiwan unigrid. In: The 4th Workshop on Grid Technologies and Application (WoGTA 2007), Providence University, Taichung,Taiwan (2007) 9. Russell, R., Haire, L., Stevens, D., Collins, P., Lin, Y., Blackburn, G., Hay, A., Gamblin, S., Skehel, J.: Structural biology: antiviral drugs fit for a purpose. Nature 443, 37 38 (2006) 10. Xtreemfs, http://www.xtreemfs.org