Future Generation Computer Systems. PDDRA: A new pre-fetching based dynamic data replication algorithm in data grids

Size: px

Start display at page:

Download "Future Generation Computer Systems. PDDRA: A new pre-fetching based dynamic data replication algorithm in data grids"

Belinda Webb
5 years ago
Views:

Future Generation Computer Systems 28 (2012) 666 681 Contents lists available at SciVerse ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.

Islamic Azad University, Tehran, Iran a r t i c l e i n f o a b s t r a c t Article history: Received 13 April 2011 Received in revised form 17 October 2011 Accepted 24 October 2011 Available online

1 Future Generation Computer Systems 28 (2012) Contents lists available at SciVerse ScienceDirect Future Generation Computer Systems journal homepage: PDDRA: A new pre-fetching based dynamic data replication algorithm in data grids Nazanin Saadat, Amir Masoud Rahmani Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran a r t i c l e i n f o a b s t r a c t Article history: Received 13 April 2011 Received in revised form 17 October 2011 Accepted 24 October 2011 Available online 3 November 2011 Keywords: Data grid Dynamic data replication File access pattern Data pre-fetching In recent years, grid technology has had such a fast growth that it has been used in many scientific experiments and research centers. A large number of storage elements and computational resources are combined to generate a grid which gives us shared access to extra computing power. In particular, data grid deals with data intensive applications and provides intensive resources across widely distributed communities. Data replication is an efficient way for distributing replicas among the data grids, making it possible to access similar data in different locations of the data grid. Replication reduces data access time and improves the performance of the system. In this paper, we propose a new dynamic data replication algorithm named PDDRA that optimizes the traditional algorithms. Our proposed algorithm is based on an assumption: members in a VO (Virtual Organization) have similar interests in files. Based on this assumption and also file access history, PDDRA predicts future needs of grid sites and pre-fetches a sequence of files to the requester grid site, so the next time that this site needs a file, it will be locally available. This will considerably reduce access latency, response time and bandwidth consumption. PDDRA consists of three phases: storing file access patterns, requesting a file and performing replication and pre-fetching and replacement. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage, total number of replications, hit ratio and percentage of storage filled Elsevier B.V. All rights reserved. 1. Introduction Grid computing refers to coordinated resource sharing and problem solving in dynamic, multi-institutional Virtual Organizations (VOs) [1]. The sharing is not primarily file exchange but rather direct access to computers, software, data, and other resources. In [2] Foster, who is considered to be the father of grid computing, introduced data grid which was an extension of the classical grid. The data grid was designed to satisfy the requirements of large size datasets, geographical distribution of users and resources, and computationally intensive analysis. The architecture was also developed to suite operations in wide area, multi-institutional and heterogeneous environments. In data grids [3 6], scientific and engineering applications often require to access a large amount of data (terabytes or petabytes). Managing such huge and widely distributed amount of data in a centralized manner is inefficient, because it imposes high volume Correspondence to: Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Simon Bolivar Blvd., Ashrafi Isfahani Blvd., Tehran, Iran. Tel.: ; fax: addresses: n.saadat@srbiau.ac.ir, nazanin.saadat@yahoo.com (N. Saadat), rahmani@srbiau.ac.ir (A.M. Rahmani). of work load to the central server. In addition storing data on a central server brings problems such as single point of failure and bottleneck. So this huge amount of data should be replicated and distributed in multiple locations of distributed system to avoid such problems. Data grid retrieves data from the closest grid site and replicates it for the requester site at the time of need. Replication is an efficient method to achieve optimized access to data and high performance in distributed environments [7]. Replication has been used in distributed computing for a long time [8]. This technique appears clearly applicable to data distribution problems such as High Energy Physics community where several thousand physicists want to access the terabytes and even petabytes of data that is produced every year. It is a reasonable way to make copies or replicas of the dataset and store these replicas among multiple sites because of the geographic distribution of the corporation in a data grid [8]. Replication creates several copies of the original file (called replicas) among the data grid and distributes them to multiple grid sites. This provides remarkably higher access speeds than having just a single copy of each file. Besides [8,9] it can effectively enhance data availability, fault tolerance, reliability, system scalability and load balancing by creating replicas and dispersing them among multiple sites. The three [10] fundamental questions any replication strategy has to answer are: When should the replicas be created? Which X/$ see front matter 2011 Elsevier B.V. All rights reserved. doi: /j.future

2 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) files should be replicated? Where the replicas should be placed? Depending on the answers, different replication strategies are born. Replication methods can be generally classified as static and dynamic. In static methods [10] after a replica is created at one site, it will be on that location until it is deleted manually by the users. Static approaches [11] determine the location of replicas during the design phase and these locations are unchangeable. The drawback with static replication methods is that they cannot adapt to changes in user behavior which is common in data grids and they are not suitable for large amount of data and large number of users. Of course static replication strategies have some benefits such as: they do not have the overhead of dynamic methods and job scheduling is done rapidly. In contrast, dynamic replication strategies have the ability to adapt to changes in user behavior and create new replicas for popular data files. In these strategies, replica creation, deletion and management are done automatically. As data grids are dynamic systems and the requirements of users are changeable during the time, dynamic replication is more suitable for these systems [1,12]. Data replication not only reduces data access costs but also increases data availability in many applications [13]. If the required files are replicated in some sites where the job is executed, then the job is able to process data without communication delay. However if the required files are not stored locally, they will be fetched from remote sites. This fetching takes a long time because of the large size of files and the limitation of network bandwidth between sites [13]. Therefore it is better to pre-fetch and pre-replicate the files that are probable to be requested in near future. This will increase data availability. In this paper we propose a new dynamic data replication algorithm which is called Pre-fetching based Dynamic Data Replication Algorithm in data grids (PDDRA). This method predicts future needs of grid sites and pre-replicates them before requests are submitted. This prediction is done according to the past file access sequences of the grid sites. So when a grid site needs a set of files, it will have them locally. Consequently, response time, access latency and bandwidth consumption will decrease considerably in this method. The rest of this paper is organized as follows: Section 2 presents a brief introduction of related works. Section 3 proposes our PDDRA algorithm. In Section 4 the simulation results will be described. Finally Section 5 concludes our discussion and presents some future work. 2. Related works Some recent studies have examined and discussed the problem of dynamic replication strategies in data grid. Some of them will be surveyed in this section. First some methods that perform replication according to access history of grid sites and an automatic data replication algorithm will be described and then some existing pre-fetching based algorithms will be specifically considered. Ming Tang et al. suggested two replication algorithms in [6]: Simple Bottom Up (SBU) and Aggregate Bottom Up (ABU) for multitier data grids. The basic idea of these algorithms is to create the replicas as close as possible to the clients that request the data files with high rates exceeding the pre-defined threshold. In these algorithms, the replication process has a down to up style. Files are replicated from down levels of hierarchy to up according to their popularity. In [14] a Popularity Based Replica Placement (PBRP) algorithm was proposed. This algorithm tries to decrease data access time by dynamically creating replicas for popular data files. The popularity of a file is determined by its access rate by the clients. PBRP creates replicas as close as possible to those clients that frequently request data files. The whole replication process is done in two phases: bottom-up access aggregation and top down replica placement. The first phase aggregates access history records for each file to upper tiers. The second phase placed replicas from top to the bottom of the tree by using the aggregated information of first phase. In [15] a new algorithm for automatic replication for Grid environment was proposed. Automatic replication is a complex task, which requires a set of algorithms including creation, removal, selection and coherency of replicas as well as replica update propagation algorithm. The replica creation algorithm is responsible for automatic creation of new replicas. The replica removal algorithm is responsible for replicas removal intended to save storage space. The replica selection algorithm is responsible for optimal replica selection for the specific read/write operation. Finally, the replica update propagation algorithm is responsible for updating out of date replicas. The proposed algorithm was tested for two types of grids: Clusterix and SGIgrid. The results indicate that the automatic replication can decrease total data access time and increase storage usage. Ruay-Shiung Chang et al. proposed a dynamic data replication mechanism in [16], which is called Latest Access Largest Weight (LALW). The design of the architecture is based on a centralized data replication management. LALW selects a popular file for replication and calculates a suitable number of copies and grid sites for replication. It sets a different weight for each data access record and so the importance of each record is differentiated. The data access records in the nearer past have higher weight. It indicates that these records have higher value of references. In other words, the data access records in the long past have lower reference value. In [9] a dynamic data replication strategy called FIRE was proposed. In this method each site maintains a file access table to keep track of its local file access history. A site makes independent decisions on whether or not deleting some old files to accommodate new files. On each site, this method analyzes the site s file access table. If on a site FIRE detects that a common set of files are frequently accessed by a group of local jobs and a strong correlation exists between a group of local jobs and a set of distributed files, FIRE will reunite the set of files onto the site by replication. In another paper [13] a new replication algorithm named Modified BHR was proposed. The proposed algorithm was based on the network level locality. The algorithm tries to replicate files within a region and stores the replica in a site where the file has been accessed frequently based on the assumption that it may require in the future. This algorithm increases the data availability by replicating files within the region to the region header and also storing them in the site where the file has been accessed frequently. It also reduces unnecessary replication. Instead of storing files in many sites, they can be stored in a particular site so that the storage usage can be reduced. All of the above algorithms perform replication only after a request is arrived. In this paper we propose a new algorithm that performs replication before a file is really requested Different models of storing file access patterns Many methods have been proposed up to now for storing the sequences of accessed files. In this section, two of them will be introduced. In fact, these models are essential for maintaining file access history and according to them the future file accesses can be predicted. The first model [17] uses a graph of probability. This model requires only one node per unique file and it keeps counts of accesses within each node. When file A is accessed, its count number is increased. Then if file B, C and D are accessed, an edge with a count of 1 is made connecting file A s node to B, C and D. If an edge connecting to these files already exists then only count of edges are incremented.

3 668 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) In [18] another model called Finite Multi-Order Context (FMOC) was proposed. It originates from text compression algorithm PPM [19]. FMOC uses a tree based structure named trie to store sequences of file accesses. Each node of trie represents a file. The children of every node represent all the files that have been seen after the parent. To model access probabilities a field of iteration number has been added to each node. PCM is the extended method of FMOC; it divides the tree to several partitions. Each partition contains limited number of nodes. If the number of nodes in a partition exceeds the limitation, the capacity of it is full and we cannot add a new node to it. When we add a new node into full partition, all node counts in the partition are divided by two and rounded to the nearest integer. Any node with zero count will be cleared to make space for new nodes. If there is no space available, the new node adding process is ignored Pre-fetching based replication algorithms in data grids In this section some of the replication algorithms with prefetching approach will be described. In [20] algorithm called GAPM (Grid Access Pattern Modeling) was proposed. GAPM uses the trie structure of [18] for storing previously seen access patterns and maintaining the count of each pattern. This trie consists of a root node in the first level and users of data grid in the second level. The nodes in the third and lower levels use file names as keys. The path to each node from its ancestor node in the second level represents a file access sequence. By using this structure GAPM predicts the next file that would be accessed with the largest probability according to access history. One of the problems of this algorithm is that it does not consider the time difference between consecutive requests and considers them as successive even if the time difference between them is much. However, the large time difference between consecutive requests makes them irrelevant to each other and so they should not be stored in one sequence. In this paper we have considered time difference between consecutive requests and locate them in one sequence only if time difference between them is smaller than the pre-defined threshold. In [21] a pre-fetching based replication algorithm was proposed. The major idea of this algorithm was to make use of the characteristic that members in a VO have similar interests in files. Each grid site has a database that consists of the history of file requests at the local site which is defined as file access sequence database. There is also a time sequence; it means that if the time interval between two continuous files requests is smaller than the threshold, the two files requests are considered to be successive. Otherwise, a sequence should be divided into two subsequences. The whole process of this algorithm is as the following: when a site does not have a file locally, it requests a remote site. The remote site transfers the file to the local site. At the same time it finds the adjacent files of the requested file from its file access sequence database and pre-fetches them for the requester site. We think that one of the problems of [21] is that it does not separate different requests arrive from different grid sites and considers them as successive file accesses. Also in [21], each grid site predicts future requests according to its own file access history and it is not aware of the access history of other grid sites of the current VO. In this paper, we defined a global database for the whole VO which keeps track of all file access sequences of existing grid sites; therefore the prediction is done according to all access patterns of current VO. Also, we separate requests arriving from different grid sites because requests from different sites are not successive. In another paper [22] a new dynamic replication method in multi-tier data grid called PHFS was proposed which was an extended version of fast spread algorithm [10]. Considering spatial locality, PHFS tries to predict future needs and pre-replicates them in hierarchical manner to increase locality in accesses. This algorithm collects the information of accesses from all over the system and makes file access log files, then by using data mining techniques, it finds the relationship between files for future predictions. The nodes in upper layers have more storage capacity and computational power, and therefore, the algorithm replicates more replicas in upper layers and only replicas with high priority are replicated in lower levels. We think that one of the deficiencies of this method is that it determines the correlation between files according to the information collected from all over the system. However the Grid consists of multiple Virtual Organizations and members in each VO have a common goal, which means that they are more likely to be interested in the same or similar content. So it is better to separate the access sequences of different VOs. 3. Pre-fetching based dynamic data replication algorithm (PDDRA) In this section our new algorithm for data replication will be presented; this algorithm is based on pre-fetching. For increasing system performance and reducing response time and bandwidth consumption it is better to pre-fetch some replicas for requester grid site, these replicas will be requested in the near future with a high probability and is better to replicate these files to requester node so the next time that the grid site needs them, it will access them locally, decreasing access latency and response time. The architecture of our proposed algorithm is illustrated in Fig. 1. As it can be seen in Fig. 1, the grid sites are located in lowest level of our proposed architecture. These grid sites consist of Computing and/or Storage Element. Multiple grid sites constitute a Virtual Organization (VO), there is a Local Server (LS) for every Virtual Organization (VO) and the Replica Catalog (RC) is located at Local Server. It is worth mentioning that as available bandwidth between the sites within a VO is higher than bandwidth between Virtual Organizations. Therefore accessing a file that is located in the current VO is faster than accessing the one that is located in the other VO. In the upper layer there is a Regional Server (RS) and each RS consists of one or more VOs. Regional Servers are connected via internet, so transferring files between them takes a long time. There is also a Replica Catalog located at each RS that is a directory of all the files stored at that region. Whenever a file that is not stored in the current VO is required, the RC of RS is asked for determining which VOs have the requested file. Suppose that grid site A requests a file that is not stored locally. Therefore it asks the RC to determine which sites have the requested file. For reducing access latency, response time and bandwidth consumption, it is better to pre-fetch replicas that are probable to be requested by the requester grid site in the near future. In this paper, we propose a new algorithm and topology for this purpose. When a required file is not in the current VO and is stored in the other VOs, a request is sent to RS. Then RS searches on its Replica Catalog table and determines the locations of the requested file in other VOs. In such situations only the required file will be replicated and because of low bandwidth between VOs, high propagation delay time and consequently high replication cost, pre-fetching will not be beneficial and will not be done. In addition in this paper we have assumed that members in a VO have similar interests of files so file access patterns of different VOs differ and consequently a file from different VO should not be pre-fetched for the requester grid site in other VO, because their requirements and access patterns are different. So only the required file will be replicated and pre-fetching will not be performed. Our proposed algorithm is constructed on the bases of an assumption: members in a VO have similar interest in files. For predicting the future accesses, past sequence of accesses should be stored. Files that will be accessed in the near future can be predicted by mining the past file access patterns. PDDRA consists of three phases:

4 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 1. PDDRA architecture. Phase 1, Storing file access patterns: in this phase, file access sequences and data access patterns are stored in a database. Phase 2, Requesting a file and performing replication and prefetching: in this phase a grid site asks for a file and replication is accomplished for it, if it is beneficial. Adjacent files of requested file are also pre-fetched for the requester grid site in this phase. Phase 3, Replacement: if there was enough space in storage element for storing a new replica, it will be stored; otherwise an existing file should be selected for replacement Phase 1: storing file access sequences This phase indicates how to store the sequence of file accesses in PDDRA algorithm. As members of a VO are more likely to access files with similar contents, one database is used for each VO; this database is located at the Local Server. By determining only one database for each VO and locating it on the Local Server, all file accesses of that VO will be collected and logged in one place, so the prediction of future needs of the requester grid site will be done according to all file access patterns of that VO. One of the novelties of our proposed algorithm is this centralized database for each VO. The previous works in the field of data pre-fetching in data grids, did not consider this fact that members in a VO have similar interests of files and so it is better to consider one database of access history for each of them. However when the number of grid sites in a VO grows, the problem of bottleneck may occur. The global database can grow to be hierarchically organized and also a backup system can be considered for it to reduce the bottlenecks implied by a central database, increasing scalability and fault tolerance. Whenever a grid site needs a file that is not stored locally, it asks the Replica Catalog to determine which grid sites have the requested replicas. Then, by examining the available bandwidth between destination element and all sites on which a replica of the file is stored, Replica Selector chooses the PFN (Physical File Name) that will be accessed fastest. As we said later, RC is located at LS, so whenever a grid site does not have a file locally, a request will be sent to the LS. In the following paragraph we describe how to store file access sequence in the database of local server. 1. Every time that LS receives a file request, it stores requester grid site s name, file s name, time of request and the number of accessing that file. So each request is stored as follows: (GridSite_name, file_name, time, number_of_accesses). 2. For the next request that arrives, LS first checks which grid site has sent it, then it searches on its database and determines when the requester grid site has requested a file for the last time (t i ), if t i+1 t i > threshold (t i+1 is time of current request) then it stores that request into a new sequence, otherwise it will be added to the end of the latest sequence. If the time difference between two continuous file requests is greater than the threshold, then they cannot be considered successive file requests; so the new request is added to a new sequence. 3. Local file accesses of grid sites also form a database of file accesses. These databases are stored locally on each grid site. Since these accesses are local, each grid site sends its local database to LS frequently and LS aggregates them with its own access recordings. 4. Before storing a new request into a database, free space of the database will be checked. If there was not enough space for storing new requests then, the earlier requests will be eliminated. As the characteristics and requirements of VO changes periodically, old requests have lower values and are the first candidates for replacement Using the trie structure for storing access sequences Like [10] a trie structure is used in our algorithm for storing file access sequences. We determine one trie for each VO. The names of existing grid sites of that VO are stored in the second level of trie. Each node of trie consists of three fields: file s name, time of request and number of accessing that file. (file_name, time, number_of_accesses). The structure of trie is illustrated in Fig. 2. For each grid site there is a pointer named LastNode that points to the last node that has been requested from that grid site in the latest time. For example, in Fig. 2, the pointer of site A points to (g, 32, 1), it means that grid site A has requested file g at time 32 and number of accesses is 1. For grid site B the pointer points to (d, 12, 1), it means that site B has requested file d at time 12 and number of accesses is 1. For site C, the pointer points to (y, 20, 1) Trie insertion algorithm In Fig. 3 the insertion algorithm in trie structure is shown. The input of algorithm is the new request that Local Server receives.

5 670 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 2. Using trie structure for storing file access sequences. Each request consists of three fields: the name of requester grid site, the name of requested file, time of request. First the algorithm checks where the pointer (LastNode) of the requester grid site is pointing to. If NewRequest.Time LastNode.Time > Threshold it means that the time difference between new request and latest registered one is greater than the threshold and they are not successive files. Consequently, the new request should be added to a new sequence. The new sequence should be created in the third level of trie structure. If a node with the value of NewRequest.FileName already exists in this level then the number of accessing that node is incremented, the LastNode pointer is updated and algorithm finishes. Otherwise, if there is not such a node with the value of NewRequest.FileName in the third level of trie then a new node with this value is created in this level, the number of accessing the new node is set to one, the pointer is updated and algorithm finishes. Else if NewRequest.Time LastNode.Time < Threshold, it means that the time difference between two requests is smaller than the threshold so the new request should be added to the end of last sequence. In this situation we are faced with two cases: the first case is when the LastNode points to a leaf node; in this case all we should do is adding the new request as a child of LastNode, then the pointer is updated to new request (LastNode = NewRequest), the number of accessing the new node is set to one and algorithm finishes. The second case is when the LastNode does not point to a leaf node. In this case, if the value of LastNode s child is equal to the value of new request then the number of accessing the LastNode s child is incremented, LastNode is updated (LastNode = LastNode.Child) and algorithm finishes. Else if the value of LastNode s child is not equal to the value of new request then the new request is added as the other child of LastNode, the pointer is updated (LastNode = NewRequest) and algorithm finishes. Suppose that Local Server receives a request like (A, a, 100). It means that grid site A has requested file a at time 100. First, it is checked where the pointer of grid site A is pointing to (Fig. 2). As it is obvious in Fig. 2, LastNode of grid site A is pointing to (g, 32, 1), it means that grid site A has requested a file in time 32 for its last request. As = 68 is greater than predefined threshold (suppose that the threshold is 30) the new request should be inserted into a new sequence, so the third level of trie structure will be checked (the branch of grid site A ) and because there is a node with a value in this level, the new request is stored in this node and all we have to do is updating its time and number of accesses fields. Now suppose that the next request is (A, n, 102), finally the trie structure becomes like Fig Phase 2: pre-fetching algorithm In this section, second phase of our proposed algorithm will be described. In this phase an application on a grid site requests a file that it is not stored locally, hence, replication is done if beneficial; however in such cases that the application depends on having the Fig. 3. Trie insertion algorithm.

6 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 4. Inserting new request in trie structure. required file local, replication should be performed. This phase includes two important algorithms, pre-fetching and PDDRA which will be described in the next paragraph. In Figs. 5 and 6, PDDRA algorithm and pre-fetching algorithm are shown respectively. In the first step of PDDRA, a grid site needs a file names a that is not stored locally, so a request should be sent to the Local Server in order to replicate that file. In the second line of PDDRA algorithm, pre-fetching algorithm is called in order to pre-fetch adjacent files of the requested file a. The output of the pre-fetching algorithm is sequence s which consists of file a with its adjacent files. As it can be seen in Fig. 6, pre-fetching algorithm searches in the third level of the trie structure of local server and finds a node with a value whose number of accesses field is greater than the predefined threshold. If the number of accesses was not greater than the threshold, then the replication will not be accomplished and the replica should be accessed remotely. Obviously, in such cases the replication is not beneficial and according to temporal locality accessing that replica in near future is not very likely. If there was more than one node with a value and the desired condition in this level, then it selects the one with the greater number of accesses. If there was not such a node in the third level, it goes to the next level of trie to find a node with a value. If there was not a node with a value in the whole trie, then it returns the null value and exits the algorithm. If the algorithm finds a node with a value then it follows that branch and selects a node with greater number of accesses between its children (between two nodes with equal number of accesses, algorithm selects the more recently created node with greater time value). Steps 3 5 of pre-fetching algorithm are executed repeatedly until it faces a leaf node, and then algorithm finishes and returns sequence s. After running step 2 of PDDRA algorithm, sequence s is returned by pre-fetching algorithm, then in the third line, Replica Updating Management Component analyzes each member of sequence s and eliminates the ones whose Replica Changing Frequency (RCF) are greater than the threshold (this component will be described in Section 3.2.1). In step 4, algorithm asks Replica Catalog to determine which grid sites have requested replicas and among these sites, Replica Selector selects the replica with Fig. 5. PDDRA algorithm. Fig. 6. Pre-fetching algorithm.

672 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) 666 681 Fig. 7. Messages passing between grid sites and LS. the minimum communication cost.

7 672 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 7. Messages passing between grid sites and LS. the minimum communication cost. The communication cost is denoted in Eq. (1). The communication cost of replica i is defined as size of replica i divided by the available bandwidth between grid site a (source grid site) and grid site b (destination grid site). In step 6 of PDDRA algorithm, replication starts and members of sequence s are replicated to the requester grid site. Communication cost replica i = Size i /BandWidth ab Size i = size of replica i BandWidth ab = available bandwidth between grid site a and grid site b. (1) In the middle of replication, if requester grid site requests a file other than the one that have been pre-fetched for it, then we stop pre-replicating other members of sequence s Replica Updating Management Component (RUMC) responsibility RUMC is one of the components of local server that registers and manages the act of edition on replicas. For this aim, RUMC indicates a Replica Changing Frequency (RCF) for each replica. According to Eq. (2), RUMC calculates RCF for all replicas. This component exists on those systems that have writable and editable files. In systems with read only files, this component has no role. RCF (replica r) = number of editions/t T = t current time of edition t last time of edition. (2) For example, suppose that replica a has been edited 4 times up to now, and the last time of edition was in t = 4, now at t = 8 it is edited too, so the RCF of replica a will be 5/(8 4) = 5/4. If this replica was edited in t = 20 instead of t = 8, then the RCF would be 5/(20 4) = 5/16. As it can be seen when the time difference between editions is long, the Replica Changing Frequency is smaller. As mentioned in the last paragraph, before pre-fetching a file, RUMC is asked to determine if the Changing Frequency of requested replica is greater than threshold or not. If RCF was greater than threshold it means that this replica is edited a lot and is not a suitable candidate for pre-fetching. As it is shown in Fig. 5, in the third step of PDDRA algorithm, RUMC analyzes each member of sequence s and eliminates those members which their RCF are greater than the threshold. Finally, the remaining elements of sequence s are the best candidates for being pre-fetched to the requester grid site The structure of local server and grid sites In this section, the internal structure of the LS and grid sites are described and then the communications between their components are analyzed. In Fig. 7, the PDDRA algorithm and the messages passing between the grid sites and the local server is illustrated. As it can be seen in this figure, algorithm starts with requesting a file and ends with replicating requested file with its adjacent files. There are a total of seven steps to complete this algorithm Components of local server. In Fig. 8 the architecture of the LS and the grid sites is shown. As it is obvious in this figure, LS and grid sites consist of several components. In this section, we will analyze each of these components and describe their responsibilities. The Local Server consists of six components. These components are as follows: 1. Replica Pre-fetching Management Component (RPMC): This component is responsible for managing data pre-fetching in the data grid. In other words, its duty is searching on the database of access history and determining the adjacent files of the requested replica. As illustrated in Fig. 8, this component consists of four elements. Prediction engine: This element is responsible for searching and mining on the database of access history and determining adjacent files. Database of access history: This database is for storing data access sequences. A trie structure is used for this database, described in Section Insertion Management Component (IMC): This component is responsible for inserting a new request to the database. In fact, it first determines which grid site has sent the request and then inserts that request to the trie structure. Aggregation Management Component (AMC): This component is responsible for aggregating local database of grid sites with database of Local Server. 2. Replica Updating Management Component (RUMC): As mentioned before, the RUMC is responsible for managing updating of replicas. 3. Replication Management Component (RMC): Its major duty is to performing replication and creating replicas. Within RMC, there is a sub-component named Replica Selector that is the heart of RMC; it selects the best replica of the requested file with the minimum communication cost for replication and informs the replica manager to fetch it. As it can be seen in Fig. 8, Replication Management Component of LS is in cooperation with RMC of all grid sites. 4. Replica Catalog (RC): This component provides mapping between logical names and physical addresses of the replicas. This mapping information is located in the RC.

N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) 666 681 673 Fig. 8. Internal structure of local server and grid sites. 5.

8 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 8. Internal structure of local server and grid sites. 5. Computing element: Computing elements run jobs, which use the data in files stored on Storage Elements. The Resource Broker controls the scheduling of jobs for Computing Elements. The computing power of the Local Server is greater than that of the common grid sites. 6. Storage elements: These elements are for storing data. The storage capacity of the Local Server is greater than that of the common grid sites Components of grid sites. Each grid sites consist of four elements. 1. Computing element: runs jobs. 2. Storage element: stores files. 3. Replication Management Component (RMC): This element is responsible for managing replication. Whenever a grid site needs a file that it is not stored locally, replica manager sends a request for that file to the Local Server. Therefore, the communication point between LS and grid sites is the Replication Management Components. This component itself consists of a sub-component: Replacement Management Component: If there was not enough space for replicating a file to a grid site, then RMC replaces the new file with low value existing files. Fuzzy function is used to determine each replica value. 4. Database of access history: This database is for storing local file accesses of grid site. These databases are aggregated with database of Local Server periodically. They are in cooperation with Aggregation Management Component of Local Server Phase 3: Replacement As the storage capacities of grid sites are limited and also as replication itself is costly, data replication should be done carefully. In the third phase of our proposed algorithm, the algorithm checks if there is enough space for storing a new replica and its adjacent files. If available space was adequate for accommodating the new file, it will be stored in destination grid site; otherwise some of the existing replicas should be removed. However, as mentioned before replication is a costly and expensive method, so replacement should be done wisely and least valuable replicas should be replaced with new ones. Therefore, replicas with high value will still remain in the grid site. In other words, a replica that it is not likely to be requested by its grid site in the near future should be removed, and even if it was requested, the cost of rereplicating the removed replica from the remote site should not be very high. So in our proposed algorithm, we defined a Replica Preserving Value for each existing replica. This factor determines the value of preserving a replica and the algorithm replaces a replica that has the least RPV with a new one. The replica with the smallest RPV is the least important one. As bandwidth between sites within a VO is high, the replacement algorithm first selects those replicas that are also available in other grid sites of the current VO, sorts them according to their RPV in ascending order, and removes the least valuable one. If available space was not still adequate, then the algorithm selects the remaining replicas that are not in the current VO and sorts them according to their RPV in ascending order and removes the least valuable one until the available space becomes adequate.

9 674 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 9. PDDRA replacement algorithm. PDDRA considers three factors (number of accessing the replica, replication cost, and the time interval between the current time and the last access time of the replica) for indicating Replica Preserving Value (RPV). When there is not enough space in the destination grid site, new replica will be replaced by a replica with minimum RPV. These factors are as follows: 1. Number of accesses a replica: a replica with a great number of accesses in past, is more likely to be requested and used again in the near future. Therefore, the RPV will be greater if number of accessing that replica is large. 2. Replication cost: a replica with high replication cost is not a suitable candidate for replacement because if the grid site needs that replica in the future, it should pay a high cost for replicating it again, and this is not economical. Therefore, the RPV will be greater if Replication Cost of that replica is high. The Replication Cost of replica r is denoted in Eq. (3). It is defined as size of replica r divided by the available bandwidth between grid site a (source grid site) and grid site g (destination grid site) plus propagation delay time. Replication Cost (replica r) = size r /bandwidth ag + propagation delay time, bandwidth ag = available bandwidth between site a and g (3) where propagation delay time is the time needed to propagate the requested replica from site a to the requester grid site g, a closer grid site propagates replica in less time. 3. Time difference between now and the last access time of replica: replicas that were recently accessed are more likely to be used in the future again. So the smaller time difference between the current time and the last access time of that replica makes RPV larger. PDDRA uses Fuzzy Logic for defining RPV factor. Our fuzzy function has three input parameters and one output, and it maps input space to an output space. The inputs of proposed Fuzzy Inference System are: 1. Number of accesses a replica 2. Replication cost and 3. Time difference between the current time and the last access time of a replica. The output is: Replica Preserving Value (RPV) PDDRA replacement algorithm Fig. 9 shows PDDRA replacement algorithm. Its inputs are destination Grid Site g and file f that should be replicated to the requester grid site g. First, the algorithm checks whether total size of the grid site g is greater than or equal to the size of file f. If statement of line 2 returned true, it means that the size of g is not adequate for storing file f, so it should be replicated to the closest grid site to g or the grid site should access f remotely, otherwise the size of destination site is adequate and replication can be done. If available storage size of grid site g is larger than the size of f, then file f can be replicated to site g rapidly, otherwise some of the existing replicas should be removed in order to store the new one. First the algorithm selects those replicas that are also available in other grid sites of current VO and puts them in selected replicas. Now the selected replicas is a list of replicas that are candidates for replacement. Then, for each member of selected replicas, the algorithm calculates its RPV by calling a fuzzy function. Fuzzy function has 3 input parameters that are: number of accesses (r), replication cost (r) and time interval of last access (r), where r is the checking replica. After calculating RPV for all members of selected replicas, algorithm sorts them according to their RPV in ascending order and starts removing replicas from the top of the list. Whenever available storage space was enough for storing a new replica, algorithm finishes. If available storage space was not enough for replicating the new replica, even by removing all members of the selected replicas, in the next step the algorithm tries to remove the replicas that were not stored in other grid sites of the current VO. As mentioned before our proposed algorithm keeps number of accesses and access time of each replica in the database of access history, so the required inputs parameters for replacement algorithm can be obtained from the database. Also, replication

N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) 666 681 675 Fig. 10. OptorSim architecture [26]. cost can be calculated by Eq. (3).

10 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 10. OptorSim architecture [26]. cost can be calculated by Eq. (3). Therefore, all required input parameters are available and prepared for PDDRA replacement algorithm. 4. Performance evaluation 4.1. Simulation tool OptorSim is used as the simulator tool to evaluate the performance of our proposed algorithm. OptorSim [23] is a simulation package written in Java. It was developed to study the effectiveness of replica optimization algorithms within a data grid environment [24] and to represent the structure of a real European Data Grid [25]. The structure [26] of OptorSim is illustrated in Fig. 10. The simulation was constructed assuming that the Grid contains several sites; each consists of zero or more Computing Elements (CEs) and zero or more Storage Elements (SEs). CEs run jobs by processing data files, which are stored in the SEs. A Resource Broker (RB) controls the scheduling of jobs to Grid Sites, and schedules jobs to CEs according to scheduling algorithm. Each site handles its file content with Replica Manager (RM), within which a Replica Optimizer (RO) contains the replication algorithm which drives automatic creation and deletion of replicas [26]. Jobs are submitted to the grid over a period of time via the RB. The RB schedules each job to the CE with the goal to improve the overall throughput of the grid. RM at each site manages the data flow between sites. The RO inside the RM is responsible for the selection and dynamic creation and deletion of file replicas. Each job has a set of files it may request. The order in which those files are requested is determined by the access pattern. The following access patterns were considered in OptorSim [24]: Sequential: the set is ordered, forming a list of successive requests. Random: files are selected randomly from a set with a flat distribution. Unitary random walk: set is ordered and successive files are exactly one element away from the previous file, direction is random. Gaussian random walk: similar to unitary random walk, but files are selected from a Gaussian distribution centered on the previous file. There are two types of algorithms in OptorSim: the scheduling algorithm used by the RB to schedule jobs to CEs and the replication algorithm used by RM at each site to manage replication. Each scheduling and replication algorithm is implemented as a separate Resource Broker and Replica Optimizer class respectively. We have made changes only in Replica Optimizer Class and the default Resource Broker class is used. The goal of scheduling algorithms is to reduce the cost needed to run a job. Currently implemented methods are [26]: 1. Random: jobs are scheduled randomly to any Computing Element that will run the job. 2. Queue length: schedules to the Computing Element with the shortest queue of waiting jobs. If two CEs have the same shortest length one of them is chosen at random. 3. File access cost: schedules to the Computing Element from which the cost to access all the files required for the job (in terms of network latencies) is the smallest. If two CEs have the same smallest access cost one of them is chosen at random. 4. File access cost + job queue access cost: scheduling is done using a combination of the access cost for the files and the access cost for all the jobs in the queue at each Computing Element. This type of scheduling is used in our simulation. There are three options for replication algorithms in OptorSim. First, one can choose No Replication which never replicates a file and all replicas are taken from the master site where the data were produced at the beginning of the simulation and the distribution of files does not change during simulation. Second, one can use LRU or LFU algorithm that always tries to replicate and, if necessary, deletes Least Recently Used files or Least Frequently Used files. Third, one can use an economic model in which algorithm only deletes files if they are less valuable than a new file. There are currently two types of the economic model: the binomial economic model, where file values are predicted by ranking the files in a binomial distribution according to their popularity in the recent past, and the Zipf economic model, where a Zipf-like distribution is used instead [26]. We have compared our proposed algorithm with all of these algorithms OptorSim configuration files There are four configuration files used to control various inputs to OptorSim. These are as follows [26,23]: Simulation parameter file It contains various simulation parameters which the user can modify like the names of the grid configuration file and the job configuration, number of jobs, the scheduling strategy for the RB, the optimization algorithm, the file access pattern, a GUI and statistics parameters, and some other important parameters Grid configuration file It describes the Grid topology and the content of each site; that is, the resource available and the network connections to other sites. The grid configuration that we have used in our simulation is the CMS Data Challenge 2002 test bed [27] (Fig. 11). For the CMS test bed, CERN and FNAL were given SEs of 100 GB and no CEs. All master files were stored at one of these sites. Every other site was given 50 GB of storage and a CE with one worker node Job configuration file It contains information on the simulated files like size of each file and its identifier, information on jobs like list of files needed for each job, the probability each job runs and the site policies for each site. In our simulation there are six job types.

11 676 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 11. CMS Data Challenge 2002 Grid topology [26] Bandwidth configuration file The bandwidth configuration file is used to describe the background network traffic. It is a site by site matrix which gives, for each pair of sites, the name of the data file containing the relevant bandwidth information and also the time difference between the reference time zone and the source site Simulation results The PDDRA algorithm was compared with No Replication, LRU, LFU, EcoModel, EcoModel Zipf-like distribution and PRA [21]. The simulation results will be discussed later, before that the modifications that have been applied to OptorSim source code will be described in the next paragraph. We have modified OptorSim code to implement our proposed algorithm and meet our assumptions. For this reason, the authors placed two new classes in \optor directory. These classes are PDDRAOptimiser and PDDRAStorageElement that determine the optimization strategy and consist of two important functions, getbestfile() and filestodelete(). We have changed these two important functions and have implemented our own algorithm, PDDRA, which predicts the future needs of grid sites and prefetches adjacent files of requested replica for requester grid site. For maintaining file access sequences, some changes have been performed to store file s name, requester grid site and time of request. In PDDRAOptimiser class, the future needs of grid sites are predicted according to file access history, and then pre-fetching is performed. As mentioned before, in the third phase of our proposed algorithm, a fuzzy function is called to determine each replica value and according to their value, replacement will be done. In the next section our fuzzy function will be described Fuzzy inference system implementation If there was not enough space available in the destination storage element for replicating new files, some of the existing files should be removed. Deletion of existing files is done Degree of membership numberofaccesses Fig. 12. Number of accesses and its membership function plot, first input of fuzzy inference system. according to their RPV, described before. For determining each Replica Preserving Value, a fuzzy function was called with 3 input parameters. We have used Matlab Fuzzy Logic Toolbox for implementing our fuzzy function. Matlab provides tools for us to create and edit fuzzy inference systems. It maps input space to output space. Our fuzzy inference system has 3 inputs and one output. The membership function plots of parameters are shown in Figs There are 3 input parameters and 1 output one. Number_of_accesses ranges from 0 to 50; it means that maximum number of accessing a file would be 50 in the simulation. Replication_cost ranges from 0 to 22.23; it means that the maximum value of replication cost for a replica is 22.23, because maximum file size is 1000 MB and minimum bandwidth is 45 Mb/s. Since replication cost is defined as file size divided by bandwidth, thus, the maximum value for replication cost would be 1000/45 = Last_access_time_interval ranges from 0 to Since the maximum time interval is set to 10 6 (this parameter determines the

12 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Table 1 Rule definitions of fuzzy inference system. Rule number Definition Weight 1 If (number_of_accesses is high) and (replication_cost is high) and (last_access_time_interval is low) then (replica_value is very_high) 1 2 If (number_of_accesses is high) and (replication_cost is high) and (last_access_time_interval is average) then (replica_value is high) 1 3 If (number_of_accesses is high) and (replication_cost is average) and (last_access_time_interval is low) then (replica_value is high) 1 4 If (number_of_accesses is average) and (replication_cost is high) and (last_access_time_interval is low) then (replica_value is high) 1 5 If (number_of_accesses is high) and (replication_cost is high) and (last_access_time_interval is high) then (replica_value is average) 1 19 If (number_of_accesses is low) and (replication_cost is average) and (last_access_time_interval is high) then (replica_value is low) 1 20 If (number_of_accesses is low) and (replication_cost is low) and (last_access_time_interval is average) then (replica_value is low) 1 21 If (number_of_accesses is low) and (replication_cost is low) and (last_access_time_interval is high) then (replica_value is very_low) Degree of membership Degree of membership replicationcost Fig. 13. Replication cost and its membership function plot, second input of fuzzy inference system. Fig. 15. system replicavalue Replica value and its membership function plot, output of fuzzy inference Degree of membership lastaccesstimeinterval 5 Fig. 14. Last access time interval and its membership function plot, third input of fuzzy inference system. time period in milliseconds of the past file accesses and is set in OptorSim simulation parameter file). Finally the range of output parameter is set between 0 and 1; therefore RPV would be a number between 0 and 1. A replica with smaller value is a potential candidate for being replaced with the new replica. Also 21 rules have been defined for the proposed fuzzy system. Some of these rules are shown in Table 1. To determine each Replica Preserving Value, the Matlab fuzzy function should be called from OptorSim and a connection should be set between these two applications. After connecting them together, 3 inputs parameters are passed to Matlab fuzzy function from OptorSim, fuzzy function is executed, and finally the output is returned to OptorSim. Now RPV is determined and replacement algorithm can be accomplished. Table 2 General simulation parameters. Parameter Value Number of sites 20 Number of storage elements (SEs) 20 Number of computing elements (CEs) 18 Number of routers 8 Storage capacity at each site (GB) 50, 100 Number of jobs 100 Number of jobs types 6 Job delay (ms) a 2500 Size of single file (GB) 1 Total size of files (GB) 97 Access history length (ms) b 10 6 Minimum bandwidth between sites (Mbit/s) 45 Maximum bandwidth between sites (Mbit/s) Number of experiments 10 a The job delay is the interval in ms between the RB submitting each job. b Determines the time period over which the past file access history is considered Final results and discussion As mentioned before the CMS Data Challenge 2002 test bed [27] has been used in our simulation. The simulated grid used in our experiments has 20 sites, 18 of them have Storage Element (SE) and Computing Element (CE) and 2 of them have only SE. The capacity of sites 14 (CERN) and 19 (FNAL) that only have Storage Elements are 100 GB (all master files are stored in these two sites at the beginning of simulation) and the other ones are 50 GB. Also there are 8 routers that do not have SEs and CEs. The general parameters of our simulation are shown in Table 2. We have compared our proposed algorithm with six of existing algorithms: No replication, LRU, LFU, EcoModel, EcoModel Zipf-like distribution and PRA [21]. The simulation results for the different access patterns are shown in Figs We ran 100 jobs with six different job types. The simulation is repeated for 10 times

678 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) 666 681 Fig. 16. Mean job time vs. different access patterns. Fig. 17. Effective network usage vs. different access patterns. Fig. 18.

Different jobs are concurrently submitted by Resource Broker to different Computing Elements, but each Computing Element runs jobs sequentially and only runs a single job at a time.

We tested PDDRA and the other algorithms in five types of access patterns: 1. Sequential, 2. Random Access, 3. Random Walk Unitary Access, 4. Random Walk Gaussian Access and 5. Random Zipf Access.

13 678 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 16. Mean job time vs. different access patterns. Fig. 17. Effective network usage vs. different access patterns. Fig. 18. Access patterns vs. total number of replications. and the final results are averaged. Different jobs are concurrently submitted by Resource Broker to different Computing Elements, but each Computing Element runs jobs sequentially and only runs a single job at a time. Table 3 shows general properties of each job type, including their Number of Files Needed, Probability of Execution and Total Number of Executions. We tested PDDRA and the other algorithms in five types of access patterns: 1. Sequential, 2. Random Access, 3. Random Walk Unitary Access, 4. Random Walk Gaussian Access and 5. Random Zipf Access. There is no overlap between file sets each job type accesses. Simulation results show that PDDRA outperforms the other comparable algorithms under all of tested access patterns. The performance evaluation metrics that we used in our simulation are: Mean Job Execution Time, Effective Network Usage (ENU), total number of replications, Hit Ratio and Percentage of Storage Filled Mean job time of all jobs on grid. The mean job time of all jobs on grid is defined as the combined total time in milliseconds of all the jobs run divided by the number of jobs completed. Note that for all the components, total job time is defined as the sum of the entire individual job times, including their queuing times [23]. We have compared Mean Job Time of our proposed algorithm with other existing ones. The comparison result is shown in Fig. 16.

N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) 666 681 679 Fig. 19. Access patterns vs. hit ratio. Fig. 20. Access patterns vs. percentage of storage filled.

1 10 Job type 5 58 0.07 7 Job type 6 6 0.

14 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) Fig. 19. Access patterns vs. hit ratio. Fig. 20. Access patterns vs. percentage of storage filled. Table 3 General properties of each job type. Job types Number of files needed Probability of execution Job type Job type Job type Job type Job type Job type Sum Total number of executions The simulation results show that PDDRA has the lowest value of Mean Job Execution Time in all the experiments and all of file access patterns. The reason is that of predicting the future needs of grid sites and pre-replication of files for them in PDDRA method, so at the time of execution, jobs will have their required files locally, reducing response time and jobs execution time remarkably. One of the important factors that decreases the grid site s job execution time is having their required files locally stored on their storage element. Also, considering number of accesses, replication cost, and last access time, in the third phase of our proposed algorithm, made our method better than the others because it replicates files wisely and does not delete valuable files which results in preserving the valuable replicas. Mean Job Execution Time is the most important evaluation metric. Therefore PDDRA can be considered as the superior strategy. As in Random access patterns including Random, Unitary random walk, Gaussian random walk and Zipf access pattern, a certain set of files is more likely to be accessed by grid sites, therefore a large percentage of accessed files have been replicated before. Hence, in most cases required files are stored locally. For this reason, as it can be seen in Fig. 16, PDDRA strategy and also all the other algorithms show more improvement for random file access patterns. For example, Gaussian file access pattern allows for files in the set to be accessed more than once, while others are never accessed, so jobs will have more files locally because these files have been replicated for grid sites beforehand Effective Network Usage (ENU). This is effectively the ratio of files transferred to files requested, so a low value indicates that the optimization strategy used is better at putting files in the right places [23]. It ranges from 0 to 1. It can be measured by using Eq. (4). ENU = (N remote file accesses + N file replications )/ (N remote file accesses + N local file accesses ). (4) The effective network usage of our proposed algorithm was compared with 6 algorithms. The result is shown in Fig. 17. As it is obvious in this Figure, PDDRA has the lowest value of ENU in most cases in comparison with others. The reason is that, PDDRA pre-fetches files for the requester grid site, so the total number of replications will decrease and total number of local accesses increase. Since the grid sites will have their required files available at the time of need, hence they do not have to replicate them from a remote site and this decreases ENU, bandwidth consumption and network traffic considerably. In fact, higher

15 680 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) availability results to lower data replication and transmission. Consequently, our proposed algorithm shows better optimizing behavior in comparison with others. As shown in Fig. 17, in some cases PDDRA does not have lower value of ENU in comparison with other algorithms. This is because of wrong prediction and wrong pre-fetching. Wrong pre-fetching just consumes bandwidth and increases ENU and does not have any profit for destination grid site Total number of replications. Great number of replications shows that large numbers of files were not stored locally at the time of need, so replication was needed in order to access the required file. As it is obvious in Fig. 18, PDDRA performs better in comparison with other algorithms and the total number of replications decreases in this method. The reason is that in this strategy, future needs of grid sites are pre-fetched for them; therefore more numbers of files are stored locally at the time of need, decreasing total number of replications remarkably. As it can be seen in Fig. 18, No Replication algorithm never replicates a file and always accesses its required files remotely Hit ratio. Hit ratio is the ratio of Total number of Local File Accesses to all accesses containing Local File Accesses, Total number of Replications and Total number of Remote File Accesses. The result is shown in Fig. 19. As it can be seen in this figure, PDDRA has the highest value of hit ratio in most cases in comparison with other algorithms. In our proposed method, total number of local accesses has been increased by predicting future needs of requester grid site and pre-fetching for it. Therefore total number of replications and remote accesses has been decreased and consequently hit ratio has been increased Percentage of storage filled. The last performance evaluation metric is Percentage of Storage Filled. This is the average percentage of capacity in MB of the Storage Elements in all grid sites used by files. The storage used for all access patterns is depicted in Fig. 20. The results show that as our proposed method pre-fetches a set of files for requester grid site therefore it fills more storage capacity in some cases. The storage used is best in No Replication algorithm because it always accesses files remotely and never performs replication, so the files are always stored in their first places. 5. Conclusion and future work In this paper we proposed a novel dynamic algorithm named PDDRA for data replication in data grids. This algorithm uses pre-fetching technique for replication. Our three phases proposed method predicts the future needs of grid sites according to their file access history and pre-fetches these files to requester grid sites before requests are made. Therefore, sites will have their required files locally at the time of need and this will decrease response time, access latency, bandwidth consumption and increase system performance considerably. As grid sites within a VO have similar interest of files, we defined a file access database for each VO and locate it at the local server. The structure of our DB is trie. This structure occupies low volume of space, searching is easy on it, and it is a suitable structure for storing sequences of accessed files. By searching file access sequence database, adjacent files of requested file are determined and they are pre-fetched for the requester grid site. To evaluate the efficiency of our algorithm, we tested our proposed algorithm with data grid simulator, OptorSim. We compared PDDRA to 6 of existing algorithms, No replication, LRU, LFU, EcoModel, EcoModel Zipf-like distribution and PRA [21]. Mean Job Time, Effective Network Usage, Total Number of Replications, Hit ratio and Percentage of Storage Filled were used as the performance evaluation metrics. We ran our simulation for different file access patterns. The experimental results show that PDDRA outperforms the other algorithms and improves Mean Job Time and Effective Network Usage under all of the access patterns, especially under the different Random file access patterns. For future works, we plan to consider more factors for determining Replica Preserving Value. We also aim to predict the future needs of grid sites more accurately by considering more factors other than just their past access sequences. Perhaps data mining techniques would be a good solution. Scalability and fault tolerance of the global database can be a further research direction. As we mentioned before because of using read-only files, we did not implement Replica Updating Management Component (RUMC) in this paper, implementing this component in OptorSim and using replica consistency management strategies are also our future work plans. Acknowledgment The authors would like to thank Iran Telecommunication Research Center (ITRC) for their financial support. References [1] I. Foster, C. Kesselman, S. Tuecke, The anatomy of the grid: enabling scalable virtual organizations, International Journal of Supercomputer Applications (2001) [2] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, The data grid: towards an architecture for the distributed management and analysis of large scientific datasets, Journal of Network and Computer Applications (2000) [3] I. Foster, The grid: a new infrastructure for 21st, Physics Today 55 (2002) [4] D. Li, N. Xiao, X. Lu, Y. Wang, K. Lu, Dynamic self-adaptive replica location method in data grids, in: Proceedings of IEEE International Conference on Cluster Computing, December 2003, pp [5] H. Stockinger, A. Samar, B. Allcock, I. Foster, K. Holtman, B. Tierney, File and object replication in data grids, Journal of Cluster Computing 5 (3) (2002). [6] M. Tang, B.S. Lee, C.K. Yao, X.Y Tang, Dynamic replication algorithm for the multi-tier data grid, Future Generation Computer Systems 21 (5) (2005) [7] A. Dogan, A study on performance of dynamic file replication algorithms for real-time file access in data grids, Future Generation Computer Systems 25 (8) (2009) [8] R.-S. Chang, P.-H. Chen, Complete and fragmented selection and retrieval in data grids, Future Generation Computer Systems 23 (2007) [9] A.R. Abdurrab, T. Xie, FIRE: a file reunion data replication strategy for data grids, in: 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010, pp [10] I. Foster, K. Ranganathan, Design and evaluation of dynamic replication strategies a high performance Data Grid, in: Proceedings of International Conference on Computing in High Energy and Nuclear Physics, China, September [11] U. Cibej, B. Slivnik, B. Robic, The complexity of static data replication in data grids, Parallel Computing 31 (8) (2005) [12] K. Ranganathan, I. Foster, Identifying dynamic replication strategies for a highperformance data grid, in: Proceedings of the Second International Workshop on Grid Computing, November 12, 2001, pp [13] K. Sashi, A.S. Thanamani, Dynamic replication in a data grid using a modified BHR region based algorithm, Future Generation Computer Systems 27 (2011) [14] M. Shorfuzzaman, P. Graham, R. Eskicioglu, Popularity-driven dynamic replica placement in hierarchical data grids, in: Proceedings of Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies, 2008, pp [15] R. Slota, L. Skital, D. Nikolow, J. Kitowski, Algorithms for automatic data replication in grid environment, in: Roman Wyrzykowski, Jack Dongarra, Norbert Meyer, Jerzy Wasniewski (Eds.), Parallel Processing and Applied Mathematics: 6th International Conference, PPAM 2005, Poznan, Poland, September 11 14, 2005, Revised Selected Papers, in: Lecture Notes in Computer Science, vol. 3911, Springer, 2006, pp [16] R.-S. Chang, H.-P. Chang, Y.-T. Wang, A dynamic weighted data replication stratrgy in data grids, The Journal of Supercomputing 45 (3) (2008) [17] J. Griffioen, R. Appleton, Performance measurements of automatic prefetching, Parallel and Distributed Computing Systems (1995) [18] T.M. Kroegar, D.E. Long Darrell, The case for efficient file access pattern modeling, in: Proceedings of the 7th Workshop on Hot Topics in Operating Systems, Rio Risco, USA, March 1999, pp

N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) 666 681 681 [19] T.C. Bell, J.G. Cleary, I.H. Witten, Text Compression, Prentice Hall, Englewood Clis, New Jersy, 1990. [20] R.S. Chang, N.

Song, A pre-fetching-based replication algorithm in data grid, in: 3th International Conference on Pervasive Computing and Applications, 2008, pp. 526 531. [22] L.M. Khanli, A. Isazadeh, T.N.

16 N. Saadat, A.M. Rahmani / Future Generation Computer Systems 28 (2012) [19] T.C. Bell, J.G. Cleary, I.H. Witten, Text Compression, Prentice Hall, Englewood Clis, New Jersy, [20] R.S. Chang, N.Y. Huang, J.S. Chang, A predictive algorithm for replication optimization in data grids, in: Proceedings of ICS 2006, Taiwan, 2006, pp [21] T. Tian, J. Luo, Z. Wu, A. Song, A pre-fetching-based replication algorithm in data grid, in: 3th International Conference on Pervasive Computing and Applications, 2008, pp [22] L.M. Khanli, A. Isazadeh, T.N. Shishavanc, PHFS: a dynamic replication method, to decrease access latency in multi tier data grid, Future Generation Computer Systems 27 (2010) [23] D.G. Cameron, R. Schiaffino, J. Ferguson, A.P. Millar, C. Nicholson, K. Stockinger, F. Zini, OptorSim v2.1 Installation and User Guide, October [24] W.H. Bell, D.G. Cameron, L. Capozza, A.P. Millar, K. Stockinger, F. Zini, Simulation of dynamic grid replication strategies in OptorSim, International Journal of High performance Computing Applications 17 (4) (2003). [25] F. Gagliardi, B. Jones, E. Laure, The EU datagrid project: building and operating a large scale grid infrastructure, in: B. Di Martino, J. Dongarra, A. Hoisie, L.Y. Yang, H. Zima (Eds.), Engineering the Grid: Status and Perspective, American Scientific Publishers, [26] D.G. Cameron, A.P. Millar, C. Nicholson, OptorSim: a simulation tool for scheduling and replica optimization in data grids, Proceedings of Computing in High Energy and Nuclear Physics (CHEP) (2004). [27] K. Holtman, CMS data grid system overview and requirement, Tech. Report CERN July, Nazanin Saadat received her B.S. degree from Central Tehran Branch, Islamic Azad University, Tehran, Iran in Computer Engineering in Since 2009 she is an M.S. student in Department of Computer and Mechatronics Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran. Her research interests include grid computing and distributed systems. Amir Masoud Rahmani received his B.S. in Computer Engineering from Amir Kabir University, Tehran, in 1996, the M.S. in Computer Engineering from Sharif University of technology, Tehran, in 1998 and the Ph.D degree in Computer Engineering from IAU University, Tehran, in He is an assistant professor in the Department of Computer and Mechatronics Engineering at the IAU University. He is the author/co-author of more than 80 publications in technical journals and conferences. He served on the program committees of several national and international conferences. His research interests are in the areas of distributed systems, ad hoc and sensor wireless networks, scheduling algorithms and evolutionary computing.

Nowadays data-intensive applications play a

Nowadays data-intensive applications play a Journal of Advances in Computer Engineering and Technology, 3(2) 2017 Data Replication-Based Scheduling in Cloud Computing Environment Bahareh Rahmati 1, Amir Masoud Rahmani 2 Received (2016-02-02) Accepted