A Set Coverage-based Mapping Heuristic for Scheduling Distributed Data-Intensive Applications on Global Grids

Size: px
Start display at page:

Download "A Set Coverage-based Mapping Heuristic for Scheduling Distributed Data-Intensive Applications on Global Grids"

Transcription

1 A Set Coverage-based Mapping Heuristic for Scheduling Distributed Data-Intensive Applications on Global Grids Srikumar Venugopal and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Laboratory Department of Computer Science and Software Engineering The University of Melbourne, Australia {srikumar, Abstract Data-intensive Grid applications need access to large datasets that may each be replicated on different resources. Minimizing the overhead of transferring these datasets to the resources where the applications are executed requires that appropriate computational and data resources be selected. In this paper, we introduce a heuristic for the selection of resources based on a solution to the Set Covering Problem (). We then pair this mapping heuristic with the well-known MinMin scheduling algorithm and conduct performance evaluation through extensive simulations. 1 Introduction Grids [8] aggregate computational, storage and network resources to provide pervasive access to their combined capabilities. Additionally, Data Grids [6, 12] provide services such as low latency transport protocols and data replication mechanisms to distributed data-intensive applications that need to access, process and transfer large datasets stored in distributed repositories. Such applications are commonly used by communities of researchers in domains such as high-energy physics, astronomy and biology. Distributed data-intensive applications commonly consist of tasks that process datasets that are located on various storage repositories or data hosts. Each of these datasets may be replicated at several locations that are connected to each other and to the computational sites (or compute resources) through networks of varying capability. Also, the datasets are generally large enough (of the order of Giga- Bytes (GB) and higher) that transferring them from storage resources to the eventual point of execution produces a noticeable impact on the execution time of the application. Therefore, mapping tasks to the appropriate compute resources for execution and to the corresponding data resources for accessing the required datasets such that the overall execution time is minimized is a challenging problem. The work in this paper is mainly concerned with scheduling applications that consist of a collection of tasks without interdependencies, each of which requires multiple datasets, onto a set of Grid resources. (Each task is translated into a job that is scheduled on to a computational resource and requests datasets from the storage resources so identified. Therefore, for the rest of the paper, we will refer to such tasks as jobs as well.) The scheduling strategy has to map a task on to a resource set consisting of one compute resource to execute the task and one data host each for transferring each dataset required for the computation. In this paper, we present such a mapping heuristic based on a solution to the and evaluate it against other heuristics through simulation. The rest of the paper is structured as follows: the next section presents the resource model and the application model that we target in the research presented in this paper. The mapping heuristic is presented in the following section and is succeded by details of experimental evaluation and consequent results. Finally, we present related work and conclude the paper. 2 Model 2.1 Resource Model We model the target data-intensive computing environment based on existing production testbeds such as the European DataGrid testbed [12] or the United States Grid3 testbed [9]. As an example, Figure 1 shows a subset of European DataGrid Testbed 1 derived from Bell, et. al [2]. The resources in the figure are spread across 7 countries and belong to different autonomous administrative domains. In such Grid networks, we consider a data-intensive 1

2 155Mb/s Imperial College NorduGrid 2.5Gb/s 1Gb/s 1Gb/s 155Mb/s 2.5Gb/s RAL 2.5Gb/s 2.5Gb/s 1Gb/s 622Mb/s 1Gb/s Milano 1Mb/s Torino 45Mb/s r 1 f 1 f 1 f 3 8 d d 2 r 2 Lyon 2.5Gb/s NIKHEF 1Gb/s 2.5Gb/s 1Gb/s 155Mb/s 1Gb/s 45Mb/s 1Mb/s Catania 155Mb/s r Router 1Gb/s Bologna Padova - Site CERN d 4 f 3 r 3 Figure 1. European Data Grid Testbed 1 [2]. computing environment to consist of M compute resources R = {r m } M m=1 and P data hosts,d = {d p } P p=1, collectively referred to as resources. A compute resource is a high performance computing platform such as a cluster consisting of processing nodes that are connected in a private local area network and are managed by a batch job submission system hosted at the head or front-end node that is connected to the public Internet. A data host can be a storage resource such as a Mass Storage Facility connected to the Internet or may be simply a storage device attached to a compute resource in which case, it inherits the network properties of the latter. It is important to note that even in the second case, the data host is considered as a separate entity from the compute resource. Data is organised in the form of datasets that are replicated on the data hosts by a separate replication process that follows a strategy (e.g. [2]) that takes into consideration various factors such as locality of access, load on the data host and available storage space. Information about the datasets and their location is available through a catalog such as the Storage Resource Broker Metadata Catalog [19]. Figure 2 shows a simplified data-intensive computing environment consisting of four compute resources and an equal number of data hosts connected by links of different bandwidths. We consider the logical network topology wherein each resource is connected to every other resource by a distinct network link. The time taken by a compute resource to access a dataset located on the storage resource at the same site is limited only by the intra-site bandwidth if the storage is a separate physical machine or by the bandwidth between the hard disk and other peripherals if the storage is on the compute machine itself. In both cases, it is considered to be an order of magnitude lower than the time taken to access a dataset through the Internet from other Figure 2. A data-intensive environment. sites as there is contention for bandwidth among the various sites. Therefore, for the purpose of this study, only the bandwidth between different physical sites is taken into account. 2.2 Application Model j f 1 f 3 d 1 d 2 d 4 Figure 3. Job Model. The application is composed of a set of N jobs without interdependencies, J = {j i } N i=1. Typically, N M. Each job j, j J requires a subset F j = {f j k }K k=1 of a set of datasets, F, which are each replicated on a subset of P data hosts, D = {d p } P p=1. For a dataset f F, D f D is the set of data hosts on which f is replicated. Each job requires one processor in a compute resource for executing the job and one data host each for accessing each of the K datasets required by the job. The compute resource and the data hosts thus selected are collectively referred to as the resource set associated with the job and is denoted by r 1 r 2 r 3 r 4 2

3 S j = {{r}, {d k } K k=1 } where r R is the compute resource where the job is to be executed and d k is the data host selected for accessing f j k F j. Figure 3 shows an example of such a job j J that requires resources shown in Figure 2. T w βt f1 T f1 T f2 T f2... T c T fk Time Figure 4. Job Execution Stages and Times. Figure 4 shows an example of a data-intensive job with the times involved in various stages shown along a horizontal time-axis. T w is the time spent in waiting in the queue on the compute resource and T c is the time spent by the job in purely computational operations (also called computation time). T w and T c are functions of the load and processing speed of the compute resource. T fi is the time required to transfer the file f i from its data host to the compute resource and is dependent on the available bandwidth between the two. The completion time for the job, T j, is the wallclock time taken for the job to finish execution and is a function of these three times. For large datasets, the data transfer time impacts the completion time significantly. While the transfer time is determined by the manner in which the dataset is processed by the job, it is also influenced by the selection of data hosts. For example, many applications request and receive required datasets in parallel before starting computation. In this case, T j = T w + max 1 i K (T fi ) + T c However, the number of simultaneous transfers determines the bandwidth available at the receiver end for each transfer and therefore, the T fi. Transfer times can be minimized by locating a compute resource associated with a data host that has the maximum number of datasets required by the job so that the bulk of the data access is local. This would also benefit the case where the job accesses datasets sequentially. We wish to minimize the total makespan [14] of the application consisting of N such data-intensive jobs. To that end, we follow the well-known MinMin heuristic, proposed in [14], to schedule the entire set of jobs. The MinMin heuristic submits the task with the minimum MCT to the compute resource that guarantees it. Therefore, our aim here is to select a resource set that produces the Minimum Completion Time (MCT) for a job. We adopt the strategy of finding the resource set with the least number of datahosts required to access the datasets required for a job and then, finding a suitable compute resource to execute it. We experimentally show that this approach produces schedules that are competitive with the best and is reasonably fast as well. 3 Scheduling Figure 5 lists a generic scheduling algorithm for scheduling a set of jobs on a set of distributed compute resources. Each of the steps can be implemented independently of each other and therefore, many strategies are possible. In this paper, we concentrate on the process within the for loop, i.e., finding the appropriate resource set for a job. The scheduler forms a part of a larger application execution framework such as a resource broker (e.g.[24]). The resource broker is able to identify resources that meet minimum requirements of the application such as architecture (instruction set), operating system, storage threshold and data access permissions and these are provided as suitable candidates for job execution to the scheduler. while there exists unsubmitted jobs do Update the resource performance data based on job scheduled in previous intervals Update network data between resources based on current conditions foreach unsubmitted job do Find the MCT and the resource set that guarantees that time end repeat Heuristically assign mapped jobs to each compute resource until all jobs are submitted or no more jobs can be submitted Wait until the next scheduling event. end Figure 5: A Generic Scheduling Algorithm. 3.1 Mapping Heuristic For a job j J, consider a graph G j = (V, E) where V = ( f F {D f}) F j and E is the set of all directed j edges {d, f} such that d D f. As an example, for the job in Figure 3, we can derive the graph shown in Figure 6(a). Our aim here is to find the minimum set H of data hosts such that there exists an edge from a member of H to f for every f F j in G and no other set H H satisfies that requirement. The set H is called the minimal dominating vertex set for graph G. However, it is possible that more than one such set exists for a graph. Our interest is in finding a minimal dominating set of datahosts such that the MCT is reduced. From Figure 6(a), we can build a reduced adjacency matrix A for graph G wherein a ik = 1 if data host d i contains f k. Such an adjacency matrix is shown in Figure 7(a). Therefore, the problem of finding the minimum dominating sets for G is now equivalent to finding the sets of the 3

4 d 1 f 1 d 2 d 4 (a) f 3 d 1 f 1 Figure 6. (a) Directed graph of data resources and data sets for job j.(b) A dominating set for the data graph. f 1 f 3 d d d 4 1 (a) d 2 d 4 (b) f 1 f 3 d d d d d 4 1 Figure 7. (a) Adjacency Matrix. (b) Tableau. least number of rows such that every column contains an entry of 1 in atleast one of the rows. Another way to look at the problem is to consider the data hosts as sets of datasets. Then, the problem becomes finding the minimum number of such sets (data hosts) such that all datasets can be covered. This problem has been studied extensively as the Set Covering Problem [1]. Christofides [7] provides an approximate tree search algorithm for finding a solution to the general Set Covering Problem. Based on this algorithm, we propose a mapping heuristic to find a minimum dominating set that ensures the smallest makespan. The heuristic is listed in Figure 8. At the start of the process, from the adjacency matrix, we create a tableau T consisting of K blocks of rows, where the k th block consists of rows corresponding to data hosts that contain f k. An example of a tableau generated from the ad- (b) f 3 Begin Main Step 1. For a job J, create the adjacency matrix A with data hosts forming the rows and datasets forming the columns. Step 2. Sort the rows of A in the descending order of the number of 1 s in a row. Step 3. Create the tableau T from sorted A and begin with initial solution set B final = φ, B = φ, E = φ and z = Step 4. Search(B final,b,t,e,z) Step 5. S j {{r}, B final } where r R such that MCT (B final ) is minimum End Main Search(B final,b,t,e,z) Step 1. Find minimum k, such that f k / E. Let T k be the block of rows in T corresponding to f k. Set a pointer q to the top of T k. while q does not reach the end of T k do F T {f i t qi = 1 1 i K} B B {d k q }, E E F T if E = F j then if z > MCT (B) then B final B, z MCT (B) else Search(B final, B,T,E,z) B B {d k q }, E E F T Increment q end Figure 8: Listing for -based Mapping Heuristic. jacency matrix of Figure 7(a) is shown in Figure 7(b). The set of data hosts B keeps track of the current solution set of datahosts, the set E contains the datasets already covered by the solution set and the variable z keeps track of the makespan offered by the current solution set. The final solution set is stored in B final. During execution, the blocks are searched sequentially starting from the k th block where k is the smallest index, 1 k K such that f k / E. Within the k th block, let d k q mark the data host under consideration where q is a row pointer within block k. We add d k q to B and all the datasets for which the corresponding row contains 1, to E as they are already covered by d k q. These datasets are removed from consideration and the process then moves to the next uncovered block until E = F j, that is, all the datasets have been covered. The function MCT (B) computes the completion time for each compute resource combined with the solution set B and returns with the MCT so found. Through the recursive procedure outlined in the listing, the heuristic then backtracks and discovers other solution sets. The solution set that guarantees minimum makespan is then chosen as the final. The search terminates when the first block is exhausted. Therefore, before the tableau is created, we sort the rows of the adjacency matrix (that is, the data hosts) in the descending order of the number of columns with 1 s (or the number of datasets contained). Also, in the tableau, the same sorting order is applied to the rows in each block. As the minimal dominating sets would obviously contain atleast one of the datahosts with the maximum number of datasets, this increases the chances of more 4

5 dominating sets being in the path of the search function within the proposed heuristic. Overall, the running time of the mapping heuristic is given by O(MK 2 ) where MK 2 is the number of resource sets that are searched by the heuristic to find one that provides the least completion time. Other heuristics that are possible or have been proposed include the ones described below: - In this mapping, a compute resource that ensures minimum computation time (T c ) is selected for the job first followed by choosing data hosts that have the best bandwidths to the selected resource. This is in contrast to our approach that places more importance on selection of data hosts. The running time of this heuristic is O(MK). - This heuristic builds the resource set by iterating through the list of datasets and making a greedy choice for the data host for accessing each dataset, followed by choosing the nearest compute resource for that data host. At the end of each iteration, it checks whether the compute resource so selected is better than the one selected in previous iteration when the data hosts selected previously are considered. This heuristic was presented in [23]. The running time of this heuristic is O(KP ). - In this case, all the possible resource sets for a particular job are generated and the one guaranteeing the MCT is chosen for the job. While this heuristic guarantees that the resource set selected will be the best for the job, it searches through MP K resource sets at a time. This leads to unreasonably large search spaces for higher values of K. For example, for a job requiring 5 datasets with 2 possible data hosts and 2 available compute resources, the search space will consist of resource sets. A point to note is that the sets of datasets required by 2 or more jobs in the same set are not mutually exclusive. Any dataset that is transferred during from one resource to another is retained at the receiver and therefore, this presents an additional source of data to successive jobs requiring access to that dataset. 4 Experiments We have used GridSim with its new Data Grid capabilities [22] to simulate the data-intensive environment and evaluate the performance of scheduling algorithms. For evaluation, we have used the EU DataGrid topology based on the testbed shown in Figure 1. The details of the Grid resources used in our evaluation is shown in Table 1. All the resources were simulated as clusters with a batch job management system using space-shared policy, as a frontend to single CPU processing nodes. The CPUs are rated in terms of MIPS (Million Instructions Per Sec). The resource at CERN was considered as a pure data source (data host) in our evaluation and hence, no jobs were submitted to it. To model resource contention caused by multiple users, we Table 1. Resources within EDG testbed used for evaluation. Resource Name No. of CPU Storage Load (Location) Nodes Rating (MIPS) (TB) RAL (UK) Imperial College (UK) NorduGrid (Norway) NIKHEF (Netherlands) Lyon (France) CERN (Switzerland) 12 Milano (Italy) 7 1,.35.5 Torino (Italy) Catania (Italy) Padova (Italy) 13 1,.5.4 Bologna (Italy) associate a mean load with each resource. The load factor is the ratio of the number of CPUs that are occupied to the total number of CPUs available within a resource. During the simulation, for each resource, we derive the instantaneous resource load from a Gaussian distribution with its mean as the load shown in Table 1. Similarly, we model the variability of the available network bandwidth by associating an availability factor with a link which is the ratio of the available bandwidth to the total bandwidth. During simulation, the instantaneous measure is derived from another Gaussian distribution centered around a mean availability factor is assigned at random to each of the links. Within this evaluation, we consider a universal set of datasets, each of which are replicated on one or more of the resources. Studies of similar environments [16] have shown that the size of the datasets follow a heavy-tailed distribution in which there are larger numbers of smaller size files and vice versa. Therefore, we generate the set of datasets with sizes distributed according to the logarithmic distribution in the interval [1GB, 6GB]. The distribution of datasets depends on many factors itself including variations in popularity, the replication strategy employed and the nature of the fabric. Within our evaluation, we have used two commonly considered patterns of file distribution: Uniform : Here, the distribution of datasets is modeled on a uniform random probability distribution. In this scenario, each file is equally likely to be replicated at any site. 5

6 Zipf : Zipf-like distributions follow a power law model in which the probability of occurence of the i th ranked file in a list of files is inversely proportional to i a where a 1. In other words, a few files are distributed widely whereas most of files are found in one or two places. This models a scenario where the files are replicated on the basis of popularity. It has been shown that Zipf-like distributions holds true in cases such as requests for pages in World Wide Web where a few of the sites are visited the most [3]. This scenario has been evaluated for a Data Grid environment in related publications [4]. Henceforth, we will consider the distribution applied to be described by the variable Dist. We also control the distribution of datasets through a parameter called the degree of replication which is the maximum possible number of copies of any dataset in a Data Grid. The degree of replication in our evaluation is 5. On the application side, there are three variables that determine the performance of the application: the size of the application or the number of jobs in the application (N), the number of datasets required by each job (K) and the computational size of a job (Size(j)) expressed in Million Instructions (MI). For each job, K datasets are selected at random from the universal set of datasets. For the purpose of comparison, we keep K a constant among all the jobs in a set although this is not a condition imposed on the heuristic itself. An experiment is described by the tuple (N, K, Size, Dist). At the beginning of each experiment, the set of datasets, their distribution among the resources and the set of jobs are generated. This configuration is then kept constant while each of the four mapping heuristics are evaluated in turn. To keep the resource and network conditions repeatable among evaluations, we use the Colt random number generator [11] with a constant seed. As there are numerous variables involved, we have conducted evaluation with different values for N, K, Size and Dist. We have conducted 5 such experiments and in the next section, we present results of our evaluation. 4.1 Results Table 2. Summary of Evaluations Mapping Geometric Avg. deg Avg. rank Heuristic Mean Compute (19.4) 3.63 (.48) First (5.55) 3.23 (.71) (1.42) 1.67 (.6) (6.46) 1.47 (.58) No. of Jobs (a) Size=6 MI,K=3, Dist=Uniform No. of Jobs (b) Size=3 MI,K=3, Dist=Zipf Figure 9. Makespan vs Number of Jobs. The results of our evaluations are summarised in Table 2 and are based on the methodology provided in [5]. refers to the heuristic proposed in this paper. For each mapping heuristic, the table contains three values: 1) Geometric Mean of the makespans, 2) Average degradation (Avg. deg.) from the best heuristic an) Average ranking (Avg. rank) of each heuristic. The geometric mean is used as the makespans vary in orders of magnitude according to parameters such as number of jobs per application set, number of files per job and the size of each job. Degradation for a heuristic is the difference between the makespan of that heuristic and that of the best heuristic for a particular experiment and expressed as a percentage of the latter. The average degradation is computed as an arithmetic mean over all experiments and the standard deviation of the population is given in the parantheses next to the means in the table. This is the measure of how far a heuristic is away from the 6

7 No. of Datasets per Job Job Size (MI) (a) N=6,Size=6 MI, Dist=Uniform (a) N=6, K=5, Dist=Uniform No. of Datasets per Job Job Size (MI) (b) N=6,Size=3 MI, Dist=Zipf (b) N=3, K=3, Dist=Zipf Figure 1. Makespan vs Datasets per Job. Figure 11. Makespan vs Job Size. best heuristic for an experiment. A lower number certainly means that the application is on an average the best one. The ranking is in the ascending order of makespans produced by the heuristics, that is, lower the makespan, lower the rank of the heuristic. The standard deviation of the population is provided alongside the averages in the table. The three values together provide a consolidated view of the performance of each heuristic. For example, we can see that on average and both perform worse than either or. However, the standard deviation of the population is much higher in the case of than that of. Therefore, it can also be said that can be expected to perform worst most of time. Indeed, in a few of the experiments, performed as good or even better than while never came close to the performance of the other heuristics. Between and, as expected, the latter is the clear winner having a consistently lower score than the former. However, the computational complexity of Brute Force means that as the number of datasets per job increases, the number of resource sets that need to be considered by the heuristic increases dramatically. The geometric mean and average rank of is close to that of heuristic. The average rank is less than 2 for both heuristics which implies that in many scenarios, provides a better performance than. This view is reinforced from the graphs in Figures 9-11 which show the effect of varying one of the variables, all others kept constant. and Brute-force give almost similar performance while either of or is the worst in almost all cases. The effect of job distribution is most visible on the heuristic. When the files are distributed according to the Zipf distribution, the perfor- 7

8 mance of comes close to or in some cases, becomes as competitive as. This is due to the fact that in Zipf distribution, there are most of the datasets are not replicated widely and therefore, there is not as much choice of datahosts as there is in Uniform distribution. In such a case, is able to form minimal resource sets. Also, it can be seen that as the number of jobs increases, the makespan of and heuristic rise more steeply than the other two. 5 Related Work Casanova, et.al [5] extend three well-known scheduling heuristics, (Max-min, Min-min and Sufferage) that were introduced previously in [14] for scheduling independent tasks onto heterogeneous resources, to consider data transfer requirements. A fourth heuristic, XSufferage, was also introduced to take into consideration sharing of files between tasks. However, the source of all the files for the tasks is the resource that dispatches the jobs. Ranganathan, et. al [2] discuss a decoupled scheduling architecture that has two components: one schedules jobs to the resources and the other replicates data on such resources in anticipation of the incoming jobs. Similar studies have been performed for different replication strategies in [2]. Park and Kim [17] propose a scheduler which schedules jobs close to the source of data or else replicates the data to the job execution site. The difference between our work and the ones presented before is that we explicitly consider the scenario in which a job requires multiple datasets whereas the others are restricted to one dataset per job. Giersch, et. al [1] present a follow-up to [5] where they consider the general problem of scheduling tasks requiring multiple files that are replicated on several repositories. They prove that this problem is NP-complete and propose faster heuristics that are competitive with XSufferage. However, the approach followed in their paper is that of scheduling the jobs first and then replicating the data so as to minimize access time. This general approach, also followed by the previous papers and which we have evaluated as, may not produce the best schedules as has been shown in the evaluation. On the other hand, we consider the selections of computational and data resources to be interrelated. Genetic Algorithm (GA) based heuristics were introduced in [13] for scheduling decomposable tasks and in [18] for sets of independent jobs that have data requirements. We consider non-decomposable jobs that are individually mapped to a set of resources. However, with modification, GA can be used in our context and will be the subject of a future evaluation. Mohamed and Epema [15] present a Close-to-Files algorithm which searches the entire solution space for a combination of computational and storage resources that minimizes execution time. Their job model is restricted to one dataset per file. This approach, which we evaluate as, produces good schedules but becomes unmanageable for large solution spaces that occur when more than one dataset is considered per job. In a previous publication [23], we have introduced the greedy mapping heuristic () for the problem presented in this paper. Similar studies have been conducted for parallel I/O in clusters and other distributed systems in the presence of data replication [21, 25]. However, we claim that the research presented in this paper is substantially different from these. For example, our heuristic favours those resources which have more of the datasets required for a job and generally tends to produce mappings that utilize the least number of datahosts possible. This is different from optimizing parallel I/O which generally tends to spread the data transfers across a larger number of resources to optimize bandwidth usage. However, we do recognize that parallel I/O techniques can be applied in the context of the problem presented and an investigation of these will be a part of our future work. 6 Conclusion and Future Work We have presented the problem of mapping an application with a collection of jobs that require multiple datasets that are each replicated on multiple data hosts to Grid resources. We have also proposed a heuristic based on a solution to the Set Covering Problem. We have shown via simulation that the proposed heuristic is better than Compute- First and approaches and leads to schedules that are competitive with the exhaustive search approach while being orders of magnitude faster. As part of immediate future work, we plan to evaluate our heuristic against the GA strategy as has been presented in related work. The performance of the -based heuristic in scenarios involving dependent tasks such as Directed Acyclic Graphs (DAGs) also needs to be investigated. In the long term future, we would like to explore the use of parallel I/O optimization techniques in the problem space presented in this paper. Acknowledgement We would like to thank Anthony Sulistio for his help with the use of Gridsim and Tianchi Ma for his comments on the paper. References [1] E. Balas and M. W. Padberg. On the Set-Covering Problem. Operations Research, 2(6): ,

9 [2] W. H. Bell, D. G. Cameron, L. Capozza, A. P. Millar, K. Stockinger, and F. Zini. Simulation of Dynamic Grid Replication Strategies in OptorSim. In Proceedings of the 3rd International Workshop on Grid Computing(GRID 2), pages 46 57, Baltimore,MD,USA, 22. Springer-Verlag. [3] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: evidence and implications. In Proceedings of the 18th Annual Joint Conference of the IEEE Computer and Communications Societies (IN- FOCOM 99.), [4] D. G. Cameron, R. Carvajal-Schiaffino, A. P. Millar, C. Nicholson, K. Stockinger, and F. Zini. Evaluating Scheduling and Replica Optimisation Strategies in Optor- Sim. In Proceedings of the 4th International Workshop on Grid Computing (Grid23), Phoenix, AZ, USA Nov. 23. IEEE CS Press. [5] H. Casanova, A. Legrand, D. Zagorodnov, and F. Berman. Heuristics for Scheduling Parameter Sweep Applications in Grid environments. In Proceedings of the 9th Heterogeneous Computing Systems Workshop (HCW 2), Cancun, Mexico, 2. IEEE CS Press. [6] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. The Data Grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 23(3):187 2, 2. [7] N. Christofides. Graph Theory: An Algorithmic Approach, chapter Independent and Dominating Sets The Set Covering Problem, pages Academic Publishers, London, UK, ISBN [8] I. Foster and C. Kesselman. The Grid: Blueprint for a Future Computing Infrastructure. Morgan Kaufmann Publishers, San Francisco, USA, [9] R. Gardner et al. The Grid23 Production Grid: Principles and Practice. In Proceedings of the 13th Symposium on High Performance Distributed Computing (HPDC 13), Hawaii, HI, USA, June 24. IEEE CS Press. [1] A. Giersch, Y. Robert, and F. Vivien. Scheduling tasks sharing files from distributed repositories. In Proceedings of the 1th International Euro-Par Conference, volume 3149 of LNCS, pages Springer-Verlag, Sept. 24. [11] W. Hoschek et al. The colt project. Available at http: //dsd.lbl.gov/ hoschek/colt/. [12] W. Hoschek, F. J. Jaen-Martinez, A. Samar, H. Stockinger, and K. Stockinger. Data Management in an International Data Grid Project. In Proceedings of the 1st IEEE/ACM International Workshop on Grid Computing (GRID ), Bangalore, India, Dec. 2. Springer-Verlag. [13] S. Kim and J. Weissman. A GA-based Approach for Scheduling Decomposable Data Grid Applications. In Proceedings of the 24 International Conference on Parallel Processing (ICPP 4), Montreal, Canada, Aug. 23. IEEE CS Press. [14] M. Maheswaran, S. Ali, H. J. Siegel, D. Hensgen, and R. F. Freund. Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems. Journal of Parallel and Distributed Computing(JPDC), 59:17 131, Nov [15] H. Mohamed and D. Epema. An evaluation of the close-tofiles processor and data co-allocation policy in multiclusters. In Proceedings of the 24 IEEE International Conference on Cluster Computing, San Diego, CA, USA, Sept. 24. IEEE CS Press,. [16] K. Park, G. Kim, and M. Crovella. On the relationship between file sizes, transport protocols, and self-similar network traffic. In Proceedings of the 1996 International Conference on Network Protocols (ICNP 96), Atlanta, GA, USA, IEEE CS Press. [17] S.-M. Park and J.-H. Kim. Chameleon: A Resource Scheduler in a Data Grid Environment. In Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 23 (CCGrid 23), Tokyo, Japan, May 23. IEEE CS Press. [18] T. Phan, K. Ranganathan, and R. Sion. Evolving toward the perfect schedule: Co-scheduling job assignments and data replication in wide-area systems using a genetic algorithm. In Proceedings of the 11th Workshop on Job Scheduling Strategies for Parallel Processing, Cambridge, MA, June 25. Springer-Verlag. [19] A. Rajasekar, M. Wan, and R. Moore. MySRB & SRB: Components of a Data Grid. In Proceedings of the 11 th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), page 31, Edinburgh, UK, 22. IEEE CS Press. [2] K. Ranganathan and I. Foster. Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. In Proceedings of the 11th IEEE Symposium on High Performance Distributed Computing (HPDC), Edinburgh, Scotland, July 22. IEEE CS Press. [21] J. R. Santos, R. R. Muntz, and B. Ribeiro-Neto. Comparing random data allocation and data striping in multimedia servers. In Proceedings of the 2 ACM International conference on Measurement and modeling of computer systems (SIGMETRICS ), pages 44 55, 2. ACM Press. [22] A. Sulistio, U. Cibej, B. Robic, and R. Buyya. A tool for modelling and simulation of data grids with integration of data storage, replication and analysis. Tech. Rep. GRIDS- TR-25-13, Grid Computing and Distributed Systems Laboratory, University of Melbourne, Australia, Nov. 25. [23] S. Venugopal and R. Buyya. A Deadline and Budget Constrained Scheduling Algorithm for e-science Applications on Data Grids. In Proceedings of the 6th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP-25), volume 3719 of Lecture Notes in Computer Science, Melbourne, Australia, Oct 25. Springer-Verlag. [24] S. Venugopal, R. Buyya, and L. Winton. A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids. In Proceedings of the 2nd Workshop on Middleware in Grid Computing (MGC 4), Toronto, Canada, Oct. 24. ACM Press. [25] J.-J. Wu and P. Liu. Distributed Scheduling of Parallel I/O in the Presence of Data Replication. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 5), Denver,CO,USA, April 25. IEEE CS Press. 9

A Set Coverage-based Mapping Heuristic for Scheduling Distributed Data-Intensive Applications on Global Grids

A Set Coverage-based Mapping Heuristic for Scheduling Distributed Data-Intensive Applications on Global Grids A Set Coverage-based Mapping Heuristic for Scheduling Distributed Data-Intensive Applications on Global Grids Srikumar Venugopal and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Laboratory

More information

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy Dynamic Data Grid Replication Strategy Based on Internet Hierarchy Sang-Min Park 1, Jai-Hoon Kim 1, Young-Bae Ko 2, and Won-Sik Yoon 2 1 Graduate School of Information and Communication Ajou University,

More information

A Rank-based Hybrid Algorithm for Scheduling Dataand Computation-intensive Jobs in Grid Environments

A Rank-based Hybrid Algorithm for Scheduling Dataand Computation-intensive Jobs in Grid Environments A Rank-based Hybrid Algorithm for Scheduling Dataand Computation-intensive Jobs in Grid Environments Mohsen Abdoli, Reza Entezari-Maleki, and Ali Movaghar Department of Computer Engineering, Sharif University

More information

A Data-Aware Resource Broker for Data Grids

A Data-Aware Resource Broker for Data Grids A Data-Aware Resource Broker for Data Grids Huy Le, Paul Coddington, and Andrew L. Wendelborn School of Computer Science, University of Adelaide Adelaide, SA 5005, Australia {paulc,andrew}@cs.adelaide.edu.au

More information

Incorporating Data Movement into Grid Task Scheduling

Incorporating Data Movement into Grid Task Scheduling Incorporating Data Movement into Grid Task Scheduling Xiaoshan He 1, Xian-He Sun 1 1 Department of Computer Science, Illinois Institute of Technology Chicago, Illinois, 60616, USA {hexiaos, sun}@iit.edu

More information

A Deadline and Budget Constrained Scheduling Algorithm for escience Applications on Data Grids

A Deadline and Budget Constrained Scheduling Algorithm for escience Applications on Data Grids A Deadline and Budget Constrained Scheduling Algorithm for escience Applications on Data Grids Srikumar Venugopal and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Laboratory, Department

More information

A new efficient Virtual Machine load balancing Algorithm for a cloud computing environment

A new efficient Virtual Machine load balancing Algorithm for a cloud computing environment Volume 02 - Issue 12 December 2016 PP. 69-75 A new efficient Virtual Machine load balancing Algorithm for a cloud computing environment Miss. Rajeshwari Nema MTECH Student Department of Computer Science

More information

University of Castilla-La Mancha

University of Castilla-La Mancha University of Castilla-La Mancha A publication of the Computing Systems Department Extending GridSim to Provide Computing Resource Failures by Agustín Caminero, Blanca Caminero, Carmen Carrión Technical

More information

SDS: A Scalable Data Services System in Data Grid

SDS: A Scalable Data Services System in Data Grid SDS: A Scalable Data s System in Data Grid Xiaoning Peng School of Information Science & Engineering, Central South University Changsha 410083, China Department of Computer Science and Technology, Huaihua

More information

A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids

A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids A Grid Broker for Scheduling Distributed Data-Oriented Applications on Global Grids Srikumar Venugopal, Rajkumar Buyya GRIDS Laboratory and NICTA Victoria Laboratory Dept. of Computer Science and Software

More information

A Toolkit for Modelling and Simulation of Data Grids with Integration of Data Storage, Replication and Analysis

A Toolkit for Modelling and Simulation of Data Grids with Integration of Data Storage, Replication and Analysis A Toolkit for Modelling and Simulation of Data Grids with Integration of Data Storage, Replication and Analysis Anthony Sulistio a,, Uroš Čibej b, Borut Robič b, and Rajkumar Buyya a a Grid Computing and

More information

MANAGEMENT AND PLACEMENT OF REPLICAS IN A HIERARCHICAL DATA GRID

MANAGEMENT AND PLACEMENT OF REPLICAS IN A HIERARCHICAL DATA GRID MANAGEMENT AND PLACEMENT OF REPLICAS IN A HIERARCHICAL DATA GRID Ghalem Belalem 1 and Bakhta Meroufel 2 1 Department of Computer Science, Faculty of Sciences, University of Oran (Es Senia), Algeria ghalem1dz@gmail.com

More information

A Dynamic Replication Strategy based on Exponential Growth/Decay Rate

A Dynamic Replication Strategy based on Exponential Growth/Decay Rate A Dynamic Replication Strategy based on Exponential Growth/Decay Rate Mohammed Madi, Suhaidi Hassan and Yuhanis Yusof College of Arts and Sciences Universiti Utara Malaysia, 06010 UUM Sintok M A L A Y

More information

Scheduling tasks sharing files on heterogeneous master-slave platforms

Scheduling tasks sharing files on heterogeneous master-slave platforms Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch 1, Yves Robert 2, and Frédéric Vivien 2 1: ICPS/LSIIT, UMR CNRS ULP 7005, Strasbourg, France 2: LIP, UMR CNRS ENS Lyon

More information

A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids

A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids A Dynamic Critical Path Algorithm for Scheduling Scientific Worflow Applications on Global Grids Mustafizur Rahman, Sriumar Venugopal and Rajumar Buyya Grid Computing and Distributed Systems (GRIDS) Laboratory

More information

Scalable Computing: Practice and Experience Volume 8, Number 3, pp

Scalable Computing: Practice and Experience Volume 8, Number 3, pp Scalable Computing: Practice and Experience Volume 8, Number 3, pp. 301 311. http://www.scpe.org ISSN 1895-1767 c 2007 SWPS SENSITIVITY ANALYSIS OF WORKFLOW SCHEDULING ON GRID SYSTEMS MARíA M. LÓPEZ, ELISA

More information

SIMULATION OF ADAPTIVE APPLICATIONS IN HETEROGENEOUS COMPUTING ENVIRONMENTS

SIMULATION OF ADAPTIVE APPLICATIONS IN HETEROGENEOUS COMPUTING ENVIRONMENTS SIMULATION OF ADAPTIVE APPLICATIONS IN HETEROGENEOUS COMPUTING ENVIRONMENTS Bo Hong and Viktor K. Prasanna Department of Electrical Engineering University of Southern California Los Angeles, CA 90089-2562

More information

Replication and scheduling Methods Based on Prediction in Data Grid

Replication and scheduling Methods Based on Prediction in Data Grid Australian Journal of Basic and Applied Sciences, 5(11): 1485-1496, 2011 ISSN 1991-8178 Replication and scheduling Methods Based on Prediction in Data Grid 1 R. Sepahvand, 2 A. Horri and 3 Gh. Dastghaibyfard

More information

Grid Scheduling Strategy using GA (GSSGA)

Grid Scheduling Strategy using GA (GSSGA) F Kurus Malai Selvi et al,int.j.computer Technology & Applications,Vol 3 (5), 8-86 ISSN:2229-693 Grid Scheduling Strategy using GA () Dr.D.I.George Amalarethinam Director-MCA & Associate Professor of Computer

More information

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications New Optimal Load Allocation for Scheduling Divisible Data Grid Applications M. Othman, M. Abdullah, H. Ibrahim, and S. Subramaniam Department of Communication Technology and Network, University Putra Malaysia,

More information

Simple Scheduling Algorithm with Load Balancing for Grid Computing

Simple Scheduling Algorithm with Load Balancing for Grid Computing Simple Scheduling Algorithm with Load Balancing for Grid Computing Fahd Alharbi College of Engineering King Abdulaziz University Rabigh, KSA E-mail: fahdalharbi@kau.edu.sa Grid computing provides the means

More information

SCHEDULING WORKFLOWS WITH BUDGET CONSTRAINTS

SCHEDULING WORKFLOWS WITH BUDGET CONSTRAINTS SCHEDULING WORKFLOWS WITH BUDGET CONSTRAINTS Rizos Sakellariou and Henan Zhao School of Computer Science University of Manchester U.K. rizos@cs.man.ac.uk hzhao@cs.man.ac.uk Eleni Tsiakkouri and Marios

More information

A Simulation Model for Large Scale Distributed Systems

A Simulation Model for Large Scale Distributed Systems A Simulation Model for Large Scale Distributed Systems Ciprian M. Dobre and Valentin Cristea Politechnica University ofbucharest, Romania, e-mail. **Politechnica University ofbucharest, Romania, e-mail.

More information

A Fault Tolerant Scheduler with Dynamic Replication in Desktop Grid Environment

A Fault Tolerant Scheduler with Dynamic Replication in Desktop Grid Environment A Fault Tolerant Scheduler with Dynamic Replication in Desktop Grid Environment Jyoti Bansal 1, Dr. Shaveta Rani 2, Dr. Paramjit Singh 3 1 Research Scholar,PTU, Kapurthala 2,3 Punjab Technical University

More information

Visual Modeler for Grid Modeling and Simulation (GridSim) Toolkit

Visual Modeler for Grid Modeling and Simulation (GridSim) Toolkit Visual Modeler for Grid Modeling and Simulation (GridSim) Toolkit Anthony Sulistio, Chee Shin Yeo, and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science

More information

A Novel Task Scheduling Algorithm for Heterogeneous Computing

A Novel Task Scheduling Algorithm for Heterogeneous Computing A Novel Task Scheduling Algorithm for Heterogeneous Computing Vinay Kumar C. P.Katti P. C. Saxena SC&SS SC&SS SC&SS Jawaharlal Nehru University Jawaharlal Nehru University Jawaharlal Nehru University New

More information

Simulation model and instrument to evaluate replication technologies

Simulation model and instrument to evaluate replication technologies Simulation model and instrument to evaluate replication technologies Bogdan Eremia *, Ciprian Dobre *, Florin Pop *, Alexandru Costan *, Valentin Cristea * * University POLITEHNICA of Bucharest, Romania

More information

Grid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation)

Grid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation) Scheduling on the Grid 1 2 Grid Scheduling Architecture User Application Grid Scheduler Grid Information Service Local Resource Manager Local Resource Manager Local L l Resource Manager 2100 2100 2100

More information

SELF-CONSTRAINED RESOURCE ALLOCATION PROCEDURES FOR PARALLEL TASK GRAPH SCHEDULING ON SHARED COMPUTING GRIDS

SELF-CONSTRAINED RESOURCE ALLOCATION PROCEDURES FOR PARALLEL TASK GRAPH SCHEDULING ON SHARED COMPUTING GRIDS SELF-CONSTRAINED RESOURCE ALLOCATION PROCEDURES FOR PARALLEL TASK GRAPH SCHEDULING ON SHARED COMPUTING GRIDS Tchimou N Takpé and Frédéric Suter Nancy Université / LORIA UMR 7503 CNRS - INPL - INRIA - Nancy

More information

QoS Guided Min-Mean Task Scheduling Algorithm for Scheduling Dr.G.K.Kamalam

QoS Guided Min-Mean Task Scheduling Algorithm for Scheduling Dr.G.K.Kamalam International Journal of Computer Communication and Information System(IJJCCIS) Vol 7. No.1 215 Pp. 1-7 gopalax Journals, Singapore available at : www.ijcns.com ISSN: 976 1349 ---------------------------------------------------------------------------------------------------------------------

More information

A Novel Data Replication Policy in Data Grid

A Novel Data Replication Policy in Data Grid Australian Journal of Basic and Applied Sciences, 6(7): 339-344, 2012 ISSN 1991-8178 A Novel Data Replication Policy in Data Grid Yaser Nemati, Faramarz Samsami, Mehdi Nikhkhah Department of Computer,

More information

Scheduling File Transfers for Data-Intensive Jobs on Heterogeneous Clusters

Scheduling File Transfers for Data-Intensive Jobs on Heterogeneous Clusters Scheduling File Transfers for Data-Intensive Jobs on Heterogeneous Clusters Gaurav Khanna 1, Umit Catalyurek 2, Tahsin Kurc 2, P. Sadayappan 1, Joel Saltz 2 1 Dept. of Computer Science and Engineering

More information

Department of Physics & Astronomy

Department of Physics & Astronomy Department of Physics & Astronomy Experimental Particle Physics Group Kelvin Building, University of Glasgow, Glasgow, G12 8QQ, Scotland Telephone: +44 (0)141 339 8855 Fax: +44 (0)141 330 5881 GLAS-PPE/2004-??

More information

Data location-aware job scheduling in the grid. Application to the GridWay metascheduler

Data location-aware job scheduling in the grid. Application to the GridWay metascheduler Journal of Physics: Conference Series Data location-aware job scheduling in the grid. Application to the GridWay metascheduler To cite this article: Antonio Delgado Peris et al 2010 J. Phys.: Conf. Ser.

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

Tortoise vs. hare: a case for slow and steady retrieval of large files

Tortoise vs. hare: a case for slow and steady retrieval of large files Tortoise vs. hare: a case for slow and steady retrieval of large files Abstract Large file transfers impact system performance at all levels of a network along the data path from source to destination.

More information

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing Sanya Tangpongprasit, Takahiro Katagiri, Hiroki Honda, Toshitsugu Yuba Graduate School of Information

More information

Mapping a group of jobs in the error recovery of the Grid-based workflow within SLA context

Mapping a group of jobs in the error recovery of the Grid-based workflow within SLA context Mapping a group of jobs in the error recovery of the Grid-based workflow within SLA context Dang Minh Quan International University in Germany School of Information Technology Bruchsal 76646, Germany quandm@upb.de

More information

Simulation of a cost model response requests for replication in data grid environment

Simulation of a cost model response requests for replication in data grid environment Simulation of a cost model response requests for replication in data grid environment Benatiallah ali, Kaddi mohammed, Benatiallah djelloul, Harrouz abdelkader Laboratoire LEESI, faculté des science et

More information

Structural Advantages for Ant Colony Optimisation Inherent in Permutation Scheduling Problems

Structural Advantages for Ant Colony Optimisation Inherent in Permutation Scheduling Problems Structural Advantages for Ant Colony Optimisation Inherent in Permutation Scheduling Problems James Montgomery No Institute Given Abstract. When using a constructive search algorithm, solutions to scheduling

More information

HETEROGENEOUS COMPUTING

HETEROGENEOUS COMPUTING HETEROGENEOUS COMPUTING Shoukat Ali, Tracy D. Braun, Howard Jay Siegel, and Anthony A. Maciejewski School of Electrical and Computer Engineering, Purdue University Heterogeneous computing is a set of techniques

More information

Virtual Machine Placement in Cloud Computing

Virtual Machine Placement in Cloud Computing Indian Journal of Science and Technology, Vol 9(29), DOI: 10.17485/ijst/2016/v9i29/79768, August 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Virtual Machine Placement in Cloud Computing Arunkumar

More information

Evaluation of an Economy-Based File Replication Strategy for a Data Grid

Evaluation of an Economy-Based File Replication Strategy for a Data Grid Evaluation of an Economy-Based File tion Strategy for a Data Grid William H. Bell 1, David G. Cameron 1, Ruben Carvajal-Schiaffino 2, A. Paul Millar 1, Kurt Stockinger 3, Floriano Zini 2 1 University of

More information

A Heuristic Based Load Balancing Algorithm

A Heuristic Based Load Balancing Algorithm International Journal of Computational Engineering & Management, Vol. 15 Issue 6, November 2012 www..org 56 A Heuristic Based Load Balancing Algorithm 1 Harish Rohil, 2 Sanjna Kalyan 1,2 Department of

More information

Distributed Scheduling of Recording Tasks with Interconnected Servers

Distributed Scheduling of Recording Tasks with Interconnected Servers Distributed Scheduling of Recording Tasks with Interconnected Servers Sergios Soursos 1, George D. Stamoulis 1 and Theodoros Bozios 2 1 Department of Informatics, Athens University of Economics and Business

More information

Scheduling Data- and Compute-intensive Applications in Hierarchical Distributed Systems

Scheduling Data- and Compute-intensive Applications in Hierarchical Distributed Systems Scheduling Data- and Compute-intensive Applications in Hierarchical Distributed Systems Matthias Röhm, Matthias Grabert and Franz Schweiggert Institute of Applied Information Processing Ulm University

More information

QoS-constrained List Scheduling Heuristics for Parallel Applications on Grids

QoS-constrained List Scheduling Heuristics for Parallel Applications on Grids 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing QoS-constrained List Scheduling Heuristics for Parallel Applications on Grids Ranieri Baraglia, Renato Ferrini, Nicola Tonellotto

More information

Two-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration

Two-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration Two-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration Hojiev Sardor Qurbonboyevich Department of IT Convergence Engineering Kumoh National Institute of Technology, Daehak-ro

More information

A New Intelligent Method in Brokers to Improve Resource Recovery Methods in Grid Computing Network

A New Intelligent Method in Brokers to Improve Resource Recovery Methods in Grid Computing Network 2012, TextRoad Publication ISSN 2090-4304 Journal of Basic and Applied Scientific Research www.textroad.com A New Intelligent Method in Brokers to Improve Resource Recovery Methods in Grid Computing Network

More information

DISTRIBUTED computing, in which large-scale computing

DISTRIBUTED computing, in which large-scale computing Proceedings of the International Multiconference on Computer Science and Information Technology pp. 475 48 ISBN 978-83-681-14-9 IN 1896-794 On the Robustness of the Soft State for Task Scheduling in Large-scale

More information

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme Yue Zhang 1 and Yunxia Pei 2 1 Department of Math and Computer Science Center of Network, Henan Police College, Zhengzhou,

More information

Trace Driven Simulation of GDSF# and Existing Caching Algorithms for Web Proxy Servers

Trace Driven Simulation of GDSF# and Existing Caching Algorithms for Web Proxy Servers Proceeding of the 9th WSEAS Int. Conference on Data Networks, Communications, Computers, Trinidad and Tobago, November 5-7, 2007 378 Trace Driven Simulation of GDSF# and Existing Caching Algorithms for

More information

Online algorithms for clustering problems

Online algorithms for clustering problems University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh

More information

Reference Point Based Evolutionary Approach for Workflow Grid Scheduling

Reference Point Based Evolutionary Approach for Workflow Grid Scheduling Reference Point Based Evolutionary Approach for Workflow Grid Scheduling R. Garg and A. K. Singh Abstract Grid computing facilitates the users to consume the services over the network. In order to optimize

More information

On Cluster Resource Allocation for Multiple Parallel Task Graphs

On Cluster Resource Allocation for Multiple Parallel Task Graphs On Cluster Resource Allocation for Multiple Parallel Task Graphs Henri Casanova Frédéric Desprez Frédéric Suter University of Hawai i at Manoa INRIA - LIP - ENS Lyon IN2P3 Computing Center, CNRS / IN2P3

More information

Mapping Heuristics in Heterogeneous Computing

Mapping Heuristics in Heterogeneous Computing Mapping Heuristics in Heterogeneous Computing Alexandru Samachisa Dmitriy Bekker Multiple Processor Systems (EECC756) May 18, 2006 Dr. Shaaban Overview Introduction Mapping overview Homogenous computing

More information

High Performance Computing Course Notes Grid Computing I

High Performance Computing Course Notes Grid Computing I High Performance Computing Course Notes 2008-2009 2009 Grid Computing I Resource Demands Even as computer power, data storage, and communication continue to improve exponentially, resource capacities are

More information

University of Castilla-La Mancha

University of Castilla-La Mancha University of Castilla-La Mancha A publication of the Computing Systems Department Simulation of Buffer Management Policies in Networks for Grids by Agustín Caminero, Anthony Sulistio, Blanca Caminero,

More information

Data Management for the World s Largest Machine

Data Management for the World s Largest Machine Data Management for the World s Largest Machine Sigve Haug 1, Farid Ould-Saada 2, Katarina Pajchel 2, and Alexander L. Read 2 1 Laboratory for High Energy Physics, University of Bern, Sidlerstrasse 5,

More information

A Model for Scientific Computing Platform

A Model for Scientific Computing Platform A Model for Scientific Computing Platform Petre Băzăvan CS Romania S.A. Păcii 29, 200692 Romania petre.bazavan@c-s.ro Mircea Grosu CS Romania S.A. Păcii 29, 200692 Romania mircea.grosu@c-s.ro Abstract:

More information

Load Balancing Algorithm over a Distributed Cloud Network

Load Balancing Algorithm over a Distributed Cloud Network Load Balancing Algorithm over a Distributed Cloud Network Priyank Singhal Student, Computer Department Sumiran Shah Student, Computer Department Pranit Kalantri Student, Electronics Department Abstract

More information

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments*

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments* Performance Analysis of Applying Replica Selection Technology for Data Grid Environments* Chao-Tung Yang 1,, Chun-Hsiang Chen 1, Kuan-Ching Li 2, and Ching-Hsien Hsu 3 1 High-Performance Computing Laboratory,

More information

A General Data Grid: Framework and Implementation

A General Data Grid: Framework and Implementation A General Data Grid: Framework and Implementation Wu Zhang, Jian Mei, and Jiang Xie Department of Computer Science and Technology, Shanghai University, Shanghai, 200072, China zhang@mail.shu.edu.cn, meijian_2003@yahoo.com.cn

More information

Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP

Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP Ewa Deelman and Rajive Bagrodia UCLA Computer Science Department deelman@cs.ucla.edu, rajive@cs.ucla.edu http://pcl.cs.ucla.edu

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Nowadays data-intensive applications play a

Nowadays data-intensive applications play a Journal of Advances in Computer Engineering and Technology, 3(2) 2017 Data Replication-Based Scheduling in Cloud Computing Environment Bahareh Rahmati 1, Amir Masoud Rahmani 2 Received (2016-02-02) Accepted

More information

An Experimental Cloud Resource Broker System for Virtual Application Control with VM Allocation Scheme

An Experimental Cloud Resource Broker System for Virtual Application Control with VM Allocation Scheme An Experimental Cloud Resource Broker System for Virtual Application Control with VM Allocation Scheme Seong-Hwan Kim 1, Dong-Ki Kang 1, Ye Ren 1, Yong-Sung Park 1, Kyung-No Joo 1, Chan-Hyun Youn 1, YongSuk

More information

LOW AND HIGH LEVEL HYBRIDIZATION OF ANT COLONY SYSTEM AND GENETIC ALGORITHM FOR JOB SCHEDULING IN GRID COMPUTING

LOW AND HIGH LEVEL HYBRIDIZATION OF ANT COLONY SYSTEM AND GENETIC ALGORITHM FOR JOB SCHEDULING IN GRID COMPUTING LOW AND HIGH LEVEL HYBRIDIZATION OF ANT COLONY SYSTEM AND GENETIC ALGORITHM FOR JOB SCHEDULING IN GRID COMPUTING Mustafa Muwafak Alobaedy 1, and Ku Ruhana Ku-Mahamud 2 2 Universiti Utara Malaysia), Malaysia,

More information

Resolving Load Balancing Issue of Grid Computing through Dynamic Approach

Resolving Load Balancing Issue of Grid Computing through Dynamic Approach Resolving Load Balancing Issue of Grid Computing through Dynamic Er. Roma Soni M-Tech Student Dr. Kamal Sharma Prof. & Director of E.C.E. Deptt. EMGOI, Badhauli. Er. Sharad Chauhan Asst. Prof. in C.S.E.

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme Yue Zhang, Yunxia Pei To cite this version: Yue Zhang, Yunxia Pei. A Resource Discovery Algorithm in Mobile Grid Computing

More information

Chapter 5. Minimization of Average Completion Time and Waiting Time in Cloud Computing Environment

Chapter 5. Minimization of Average Completion Time and Waiting Time in Cloud Computing Environment Chapter 5 Minimization of Average Completion Time and Waiting Time in Cloud Computing Cloud computing is the use of the Internet for the tasks the users performing on their computer. Cloud computing, also

More information

Report Seminar Algorithm Engineering

Report Seminar Algorithm Engineering Report Seminar Algorithm Engineering G. S. Brodal, R. Fagerberg, K. Vinther: Engineering a Cache-Oblivious Sorting Algorithm Iftikhar Ahmad Chair of Algorithm and Complexity Department of Computer Science

More information

QUT Digital Repository:

QUT Digital Repository: QUT Digital Repository: http://eprints.qut.edu.au/ This is the accepted version of this conference paper. To be published as: Ai, Lifeng and Tang, Maolin and Fidge, Colin J. (2010) QoS-oriented sesource

More information

An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs

An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs Jin Liu 1, Hongmin Ren 1, Jun Wang 2, Jin Wang 2 1 College of Information Engineering, Shanghai Maritime University,

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

Random Neural Networks for the Adaptive Control of Packet Networks

Random Neural Networks for the Adaptive Control of Packet Networks Random Neural Networks for the Adaptive Control of Packet Networks Michael Gellman and Peixiang Liu Dept. of Electrical & Electronic Eng., Imperial College London {m.gellman,p.liu}@imperial.ac.uk Abstract.

More information

Multi-objective Heuristic for Workflow Scheduling on Grids

Multi-objective Heuristic for Workflow Scheduling on Grids Multi-objective Heuristic for Workflow Scheduling on Grids Vahid Khajehvand 1, Hossein Pedram 2, and Mostafa Zandieh 3 1 Department of Computer Engineering and Information Technology, Qazvin Branch, Islamic

More information

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi MATSUO Nagoya Institute of Technology Gokiso, Showa, Nagoya, Aichi,

More information

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces Measuring and Evaluating Dissimilarity in Data and Pattern Spaces Irene Ntoutsi, Yannis Theodoridis Database Group, Information Systems Laboratory Department of Informatics, University of Piraeus, Greece

More information

Complementary Graph Coloring

Complementary Graph Coloring International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,

More information

Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm

Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm Thomas Phan IBM Almaden Research Center phantom@us.ibm.com Radu Sion

More information

An Improved Heft Algorithm Using Multi- Criterian Resource Factors

An Improved Heft Algorithm Using Multi- Criterian Resource Factors An Improved Heft Algorithm Using Multi- Criterian Resource Factors Renu Bala M Tech Scholar, Dept. Of CSE, Chandigarh Engineering College, Landran, Mohali, Punajb Gagandeep Singh Assistant Professor, Dept.

More information

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web TR020701 April 2002 Erbil Yilmaz Department of Computer Science The Florida State University Tallahassee, FL 32306

More information

Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm

Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm Thomas Phan 1, Kavitha Ranganathan 2, and Radu Sion 3 1 IBM Almaden

More information

Scheduling in Multiprocessor System Using Genetic Algorithms

Scheduling in Multiprocessor System Using Genetic Algorithms Scheduling in Multiprocessor System Using Genetic Algorithms Keshav Dahal 1, Alamgir Hossain 1, Benzy Varghese 1, Ajith Abraham 2, Fatos Xhafa 3, Atanasi Daradoumis 4 1 University of Bradford, UK, {k.p.dahal;

More information

A Component Framework for HPC Applications

A Component Framework for HPC Applications A Component Framework for HPC Applications Nathalie Furmento, Anthony Mayer, Stephen McGough, Steven Newhouse, and John Darlington Parallel Software Group, Department of Computing, Imperial College of

More information

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm Henan Zhao and Rizos Sakellariou Department of Computer Science, University of Manchester,

More information

Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading

Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading Mario Almeida, Liang Wang*, Jeremy Blackburn, Konstantina Papagiannaki, Jon Crowcroft* Telefonica

More information

1. Performance Comparison of Interdependent and Isolated Systems

1. Performance Comparison of Interdependent and Isolated Systems Supplementary Information for: Fu, G., Dawson, R., Khoury, M., & Bullock, S. (2014) Interdependent networks: Vulnerability analysis and strategies to limit cascading failure, European Physical Journal

More information

Efficient Task Scheduling Algorithms for Cloud Computing Environment

Efficient Task Scheduling Algorithms for Cloud Computing Environment Efficient Task Scheduling Algorithms for Cloud Computing Environment S. Sindhu 1 and Saswati Mukherjee 2 1 Research Scholar, Department of Information Science and Technology sindhu.nss@gmail.com 2 Professor

More information

GRB. Grid-JQA : Grid Java based Quality of service management by Active database. L. Mohammad Khanli M. Analoui. Abstract.

GRB. Grid-JQA : Grid Java based Quality of service management by Active database. L. Mohammad Khanli M. Analoui. Abstract. Grid-JQA : Grid Java based Quality of service management by Active database L. Mohammad Khanli M. Analoui Ph.D. student C.E. Dept. IUST Tehran, Iran Khanli@iust.ac.ir Assistant professor C.E. Dept. IUST

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Identifying Dynamic Replication Strategies for a High- Performance Data Grid

Identifying Dynamic Replication Strategies for a High- Performance Data Grid Identifying Dynamic Replication Strategies for a High- Performance Data Grid Kavitha Ranganathan and Ian Foster Department of Computer Science, The University of Chicago 1100 E 58 th Street, Chicago, IL

More information

A Hybrid Recursive Multi-Way Number Partitioning Algorithm

A Hybrid Recursive Multi-Way Number Partitioning Algorithm Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Hybrid Recursive Multi-Way Number Partitioning Algorithm Richard E. Korf Computer Science Department University

More information

Multi-path based Algorithms for Data Transfer in the Grid Environment

Multi-path based Algorithms for Data Transfer in the Grid Environment New Generation Computing, 28(2010)129-136 Ohmsha, Ltd. and Springer Multi-path based Algorithms for Data Transfer in the Grid Environment Muzhou XIONG 1,2, Dan CHEN 2,3, Hai JIN 1 and Song WU 1 1 School

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

A priority based dynamic bandwidth scheduling in SDN networks 1

A priority based dynamic bandwidth scheduling in SDN networks 1 Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems

More information

Source Routing Algorithms for Networks with Advance Reservations

Source Routing Algorithms for Networks with Advance Reservations Source Routing Algorithms for Networks with Advance Reservations Lars-Olof Burchard Communication and Operating Systems Technische Universitaet Berlin ISSN 1436-9915 No. 2003-3 February, 2003 Abstract

More information

Cache Management for Shared Sequential Data Access

Cache Management for Shared Sequential Data Access in: Proc. ACM SIGMETRICS Conf., June 1992 Cache Management for Shared Sequential Data Access Erhard Rahm University of Kaiserslautern Dept. of Computer Science 6750 Kaiserslautern, Germany Donald Ferguson

More information