Distributed Computing with Hierarchical Master-worker Paradigm for Parallel Branch and Bound Algorithm
|
|
- Elwin Johnston
- 5 years ago
- Views:
Transcription
1 Distributed Computing with Hierarchical Master-worker Paradigm for Parallel Branch and Bound Algorithm Kento Aida Tokyo Institute of Technology / PRESTO, JST aida@dis.titech.ac.jp Yoshiaki Futakata nihou@alab.dis.titech.ac.jp Wataru Natsume Tokyo Institute of Technology natsume@alab.dis.titech.ac.jp Abstract This paper discusses the impact of the hierarchical master-worker paradigm on performance of an application program, which solves an optimization problem by a parallel branch and bound algorithm on a distributed computing system. The application program, which this paper addresses, solves the BMI Eigenvalue Problem, which is an optimization problem to minimize the greatest eigenvalue of a bilinear matrix function. This paper proposes a parallel branch and bound algorithm to solve the BMI Eigenvalue Problem with the hierarchical master-worker paradigm. The experimental results showed that the conventional algorithm with the master-worker paradigm significantly degraded performance on a Grid test bed, where computing resources were distributed on WAN via a firewall; however, the hierarchical master-worker paradigm sustained good performance. 1 Introduction Progress of Grid computing technology significantly reduces the cost for high-performance computing, and has possibility to extend a user community of high performance computing. Grid computing enables many user communities, who have large scale problems but do not have highend supercomputers, to perform computation with huge computational power. The research community related to an optimization problem is one of these communities, that is, while they have a lot of NP-hard problems, which require huge computational power to even obtain semi-optimal solutions, they need to scale down the size of problems to solve because they do not have enough computational power. Thus, Grid computing has possibility to scale up the size of a solvable optimization problem in their community. Parallel applications to solve optimization problems on a PC cluster or on a Grid have been studied[1, 6, 10, 13]. These applications use the master-worker paradigm, where a single master process dispatches a subset of computation, or a task, to multiple worker processes and gathers computed results from the worker processes. The masterworker paradigm is successfully used in many parallel applications on PC clusters and on a Grid [1, 2, 6, 7, 10, 12] as a common framework to implement parallel applications. However, the performance of an application with the master-worker paradigm is affected by many factors. For instance, communication overhead between a master process and worker processes could degrade performance. Particularly, degradation of performance could be significant on a Grid, because communication overhead among computing resources connected by WAN is high. Also, the performance of a master process could be a bottleneck of application performance if a master process controls too many worker processes, because a master process frequently communicate with all of worker processes. The hierarchical master-worker paradigm is one of solutions to avoid performance degradation in the masterworker paradigm. In the hierarchical master-worker paradigm, a single supervisor process controls multiple process sets, each of which is composed of a single master process and multiple worker processes. Distribution of tasks is performed in two phases: distribution from a supervisor process to master processes and that from a master process to worker processes. Collection of computed results is performed in the reverse way. The hierarchical masterworker paradigm has advantages compared with the conventional master-worker paradigm. The first advantage is to reduce communication overhead by putting a set of a master process and worker processes, which frequently communicate with each other, on tightly coupled computing resources. The second advantage is to avoid that a single master process becomes a performance bottleneck by distributing work among multiple master processes.
2 This paper discusses the impact of the hierarchical master-worker paradigm on the performance of an application program, which solves an optimization problem by a parallel branch and bound algorithm on a distributed computing system. The application program, which this paper addresses, solves the BMI Eigenvalue Problem, which is an optimization problem to minimize the greatest eigenvalue of a bilinear matrix function. This paper proposes a parallel branch and bound algorithm to solve the BMI Eigenvalue Problem with the hierarchical master-worker paradigm. The experimental results showed that the conventional algorithm with the master-worker paradigm significantly degraded performance on a Grid test bed, where computing resources were distributed on WAN via a firewall; however, the hierarchical master-worker paradigm sustained good performance. The rest of the paper is organized as follows: Section 2 describes the BMI Eigenvalue Problem, and a parallel branch and bound algorithm to solve the problem with the conventional master-worker paradigm. Section 3 describes the proposed parallel branch and bound algorithm with the hierarchical master-worker paradigm, and Section 4 presents its experimental results on a Grid test bed. Section 5 describes related works, and finally, Section 6 presents conclusions and future work. 2 BMI Eigenvalue Problem This section describes an overview of the BMI Eigenvalue Problem and a parallel branch and bound algorithm to solve this problem with the conventional master-worker paradigm. 2.1 BMI Eigen value Problem The BMI Eigenvalue Problem is to find the solution, x and y, which minimize the greatest eigenvalue of F (x; y) presented as follows: Let F : R nx R ny!r m m be a biaffine function derived by (1), where symmetric matrices, F ij R m m (i =0; ;n x ;j =0; ;n y ), are given, and x := (x1; ;x nx ) T, and y := (y1; ;y ny ) T. Xn x Xnx n X y Xny F (x; y) := F00 + x i F i0 + i=1 j=1 + = Fij T 2 y j F0j x i y j F ij (1) i=1 j=1 The BMI Eigenvalue Problem is recognized as a general framework for analysis and synthesis of control systems in variety of industrial applications, such as position control of a helicopter and control of robot arms. Thus, speedup of the computation is expected in the control theory community in order to enable analysis and synthesis of large scale control systems. Also, in the operations research community, it is an academic grand challenge to solve the large scale problem that has never been solved. 2.2 Parallel Branch and Bound Algorithm The BMI Eigenvalue Problem is NP-hard problem; thus, practical branch and bound algorithms to solve the ffloptimal solution, where an error of the optimal value is less than ffl, have been proposed[4, 5]. However, these algorithms still require huge computation time to solve a large scale problem such as a control problem for a real industrial application, and it restricts the size of a solvable problem for the control system to small. A parallel branch and bound algorithm to solve the ffloptimal solution of the BMI Eigenvalue Problem has been proposed[1]. The proposed algorithm performs a parallel branch and bound method with the master-worker paradigm on a PC cluster or on a Grid. In the proposed algorithm, a single master process maintains a search tree. It dispatches subproblems, which correspond to leaf nodes on the search tree, to multiple worker processes and receives computed results from the worker processes. Here, the computed results contain the best upper bound of the objective function, the best solutions of the objective function, and subproblems that have generated by branching and have not been pruned on a worker process. A worker process that received a subproblem from a master process performs branching, that is, it decomposes a subproblem into multiple subproblems and generates a subset of the search tree. Next, it computes the lower/upper bound for each subproblem on the tree, and performs bounding, that is, it prunes an unnecessary subproblem, in which its lower bound exceeds the current best upper bound. Finally, a worker process returns the computed results to a master process. A master and worker processes repeat these procedures until the gap between the lowest lower bound and the best upper bound converges within ffl. 3 Hierarchical Master-Worker Paradigm This section describes the proposed parallel branch and bound algorithm to solve the BMI Eigenvalue Problem with the hierarchical master-worker paradigm. 3.1 Dra wbacks in the Master-worker Paradigm While the master-worker paradigm is successfully used in many parallel applications as a common framework to 2
3 implement parallel applications, it has drawbacks when it is used on a Grid, where a large number of computing resources are connected via WAN. : subproblem master worker Communication Overhead Communication overhead between a master process and worker processes significantly affects the performance of an application. Communication occurs when a master process dispatches a task to a worker process and a worker process returns computed results to a master process. Performance degradation occurs when communication overhead is relatively large compared with execution time of a single task. For the application addressed in this paper, granularity of a single task tends to be small, e.g. execution time of a single task is a few seconds or less in a real problem. Thus, the impact of communication overhead on performance could be significant. Furthermore, the impact of communication overhead could be more significant on a Grid, where computing resources are connected via WAN with high latency and low throughput. For instance, let us suppose that a master process running on a local computer dispatches tasks to worker processes running on a remote PC cluster. If the local computer and the remote PC cluster are connected via WAN with high latency and low throughput, high communication overhead degrades the performance of an application significantly. Furthermore, in many cases, a remote PC cluster is installed in a private network beyond a firewall. Thus, a local user needs to run an application with a setting to establish communication through a firewall, e.g. ssh tunneling; this setting further increases communication overhead Bottleneck on a Single Master Process The performance of a master process could be a bottleneck of application performance if the master process controls too many worker processes. A master process continuously communicates with all worker processes to find an idle worker process, to dispatch new tasks and to receive computed results. A master process needs to perform these procedures in very frequent manner, because task granularity in the application addressed in this paper tends to be small. Thus, if a master process controls too many worker processes, procedures for computation and I/O on a master process degrades performance. 3.2 Proposed P arallel B&B Algorithm The proposed algorithm performs a parallel branch and bound method to solve the BMI Eigenvalue Problem with the hierarchical master-worker paradigm, where a supervisor process controls multiple process sets, each of which is composed of a master process and worker processes. A supervisor, Z M1, Z, Z M2, Z, Z W1, Z M1 W 1 M 1, Z W2, Z M1 W 2 M 2 W 1 W 2 Figure 1. The hierarchical master-worker paradigm set of a master and worker processes performs a parallel branch and bound method for a subset of a search tree, that is, a master process dispatches subproblems to multiple worker processes and receives computed results from the worker processes. A supervisor process performs load balancing among master processes by migrating subproblems among master processes. Also, a supervisor process and master processes gather the best upper bound of the objective function, which is computed on each worker process, and updates the current best upper bound on all worker processes hierarchically. The updating of the current best upper bound is crucial to improve the performance of the application, because it accelerates bounding[1]. Figure 1 shows an overview of the proposed algorithm. On the figure, ZWi, ZMj and Z denote the current best upper bound of the objective function stored on a worker process Wi, a master process Mj and a supervisor process, respectively. Here, i = 1... the number of worker processes in a set of a master and worker processes and j = 1... the number of master processes. The rest of this section describes detailed procedures of a worker, a master and a supervisor processes in the proposed algorithm Worker Process In a set of a master and worker processes, Si, a worker process, Wj, performs the following steps whenever it is dispatched a subproblem from a master process, Mi. (1) Wj receives a subproblem and the current best upper bound of the objective function stored on Mi,or Z Mi, from Mi. (2) Wj branches the subproblem and generates a tree of subproblems. 3
4 (3) Wj computes the lower/upper bound of the objective function for subproblems on the tree. For each subproblem on the tree, if the computed upper bound is less than the current best upper bound stored on Wj, or Z Wj, Wj updates Z Wj to the lower value. (4) Wj prunes unnecessary subproblems, in which their lower bounds exceed Z Wj. (5) Wj returns the computed results, or subproblems that have not been pruned, Z Wj and the solution of the objective function, to Mi Master Process A master process has two roles: the role to perform a parallel branch and bound method with worker processes and that to achieve load balancing in cooperation with a supervisor process. A master process, Mi, repeats the following procedure until it receives a request to terminate the computation from a supervisor process. (1) Mi examines a request from a supervisor process. If Mireceived the request, it performs one of the following actions: ffl If the request is to query about the current computed results stored on Mi, Mi sends the results to a supervisor process. The results contain the number of subproblems assigned to Mi, Z Mi, the current lowest lower bound of the objective function, and the solution of the objective function. ffl If the request is to steal subproblems on Mi, Mi sends subproblems to a supervisor process. ffl If the request is to assign new subproblems to Mi, Mireceives subproblems. ffl If the request is to update the current best upper bound on Mi, Mi compares the best upper bound stored on the supervisor process, or Z, with the current Z Mi.IfZis less than Z Mi, Mi updates Z Mi to the value of Z. (2) Miprobes worker processes to find an idle worker process. If Mi finds an idle worker process, Wj, it performs the following steps. (a) Mi receives computed results, which contains subproblems generated on Wj, Z Wj and the solution of the objective function, from Wj. In the initial phase of the execution, a worker process has never been dispatched a subproblem; in this case, this step is skipped. (b) Mi prunes unnecessary subproblems, in which their lower bounds exceed Z Mi. (c) Mi compares the current Z Mi and Z Wj.IfZ Wj is less than Z Mi, Miupdates Z Mi to the value of Z Wj. (d) Midispatches a new subproblem and sends Z Mi to Wj Supervisor Process A supervisor process performs the following steps to achieve load balancing and to share the best upper bound of the objective function among all processes. (1) A supervisor process queries a master process, Mi, about the computed results and receives the results. (2) A supervisor process computes Nmg i derived by the following formula: mx Nmg i = N i 1 N k (2) m k=1 Here, N i denotes the number of subproblems assigned to Mi 1, and m denotes the total number of master processes. Nmg i represents the number of subproblems that will migrate from/to Mito achieve load balancing. A supervisor process performs migration of subproblems among master processes as follows: ffl If Nmg i > 0, a supervisor process requests Mi to send back Nmg i of subproblems and receives the subproblems. ffl If Nmg i < 0, a supervisor process requests Mi to receive jnmg i j of subproblems and assigns the subproblems. (3) A supervisor process compares Z Mi with the current best upper bound stored on itself, or Z. IfZ Mi is less than Z, a supervisor process updates Z to the value of Z Mi, and requests all master processes to update the current best upper bound on the master processes with the updated Z. (4) A supervisor process computes the lowest lower bound of the objective function, L, by comparing the lowest lower bounds computed on master processes, and examines if the condition, (Z L)=jLj < ffl, is satisfied. If it is satisfied, a supervisor process requests all master processes to terminate computation. The load balancing policy presented in this section is to distribute the equal number of subproblems to master processes. A supervisor process could have more aggressive 1 Ni includes the number of subproblems computed on worker processes in S i. 4
5 Table 1. Computing resources on the test bed name processor, memory, NIC OS location PC1 Pentium II 400MHz, 256MB, 100Base-T Linux UCSD PC2 Pentium III 700MHz, 256MB, 100Base-T Linux TITECH PC3 Pentium 4 1.9GHz, 512MB, 100Base-T Linux TITECH PCC1 dual Pentium III 1.4GHz, 512MB x18nodes, 100Base-T Linux TITECH PCC2 dual Pentium III 1.4GHz, 512MB x12nodes, 100Base-T Linux TITECH strategies, e.g. it estimates a cost to execute a task and assigns tasks to master processes in proportion to their computational power. However, the proposed algorithm in this paper applies the conservative strategy. The reason is that the characteristic of the target application makes it difficult to estimate a cost of a task, because subproblems computed on a worker process are generated/pruned by branching/bounding dynamically during the execution. We performed a preliminary experiment for the load balancing algorithm presented in this section. We solved benchmark problems by the proposed algorithm on a test bed, where five master processes and 64 worker processes ran on PC clusters connected to LAN. In the result, no idle time was observed on master processes, that is, the load balancing was achieved well. However, discussion about the load balancing policy is one of research issues for performance improvement. The further discussion about this issue is our future work. 3.3 Implemen tation on a Grid test bed We implemented the proposed algorithm on a Grid test bed using the Grid RPC middleware, Ninf[9, 11]. Ninf provides remote procedure call facilities which are designed to provide a programming interface similar to conventional function calls and enable users to build Grid-enabled applications. A client computer in Ninf, Ninf client, is allowed to request a remote computer, Ninf server, to execute computing routines, Ninf library, installed on the Ninf server through the Ninf client API, Ninf call(). In the implementation of the hierarchical master-worker paradigm, a worker process is implemented as a Ninf library and a master process invokes the Ninf library to dispatch a subproblem through Ninf call(). A master process is implemented as a set of multiple programs: a Ninf client program that invokes worker processes to perform a parallel branch and bound method, and Ninf libraries that perform load balancing in cooperation with a supervisor process. A supervisor process is implemented as a Ninf client program that invokes Ninf libraries in a master process to perform load balancing through Ninf call(). 4 Experimental Results This section presents experimental results of the proposed parallel branch and bound algorithm to solve the BMI Eigenvalue Problem with the hierarchical master-worker paradigm. The experiment was performed on a Grid test bed constructed by computing resources installed in the University of California, San Diego (UCSD) and the Tokyo Institute of Technology (TITECH). Table 1 shows computing resources used on the test bed. The measured ping latency between a computer in UCSD and that in TITECH is 152.7[ms]. Benchmark problems for the experiment contain the helicopter control problem[8] and the synthetic problem[1]. The helicopter control problem is a real application problem, which finds optimal parameters for a controller to control a position of a helicopter. Its problem size is: n x =10, n y =2, and m =8. Here, m denotes matrix size of F ij. The synthetic problem is created by generating elements of matrices, F ij, randomly. Its problem size is larger than that in the helicopter control problem: n x = 6, n y = 6, and m = 24. Sequential execution time for the benchmark problems on PC3 is 1087 [sec] and [sec] for the helicopter control problem and the synthetic problem, respectively. 4.1 Comparison of Master-worker Paradigms First, this paper shows performance comparison between the master-worker paradigm and the hierarchical masterworker paradigm on a Grid test bed. Figure 2 and 3 present execution time to solve the benchmark problems by the master-worker paradigm (mw) and by the hierarchical master-worker paradigm (hmw). On the figures, LAN indicates execution time where a supervisor process, a master process and worker processes run on computers connected to LAN in TITECH. Here, a master process runs on PC2 and 16 worker processes run on eight nodes (16 CPUs) in PCC1 for mw; a supervisor process runs on PC2, a master process runs on a single node in PCC1, and 16 worker processes run on eight nodes in PCC1 for hmw. Next, LAN+ssh 5
6 WAN+ssh LAN+ssh LAN hmw mw Table 2. overhead of Ninf call helicopter [sec] synthetic [sec] LAN LAN+ssh WAN+ssh execution time [sec] Figure 2. Execution time of the helicopter control problem on a Grid test bed WAN+ssh LAN+ssh LAN execution time [sec] hmw mw Figure 3. Execution time of the synthetic problem on a Grid test bed means execution time where all processes run on computers in TITECH but a master/supervisor process needs to communicate with a worker/master process via the firewall by ssh tunneling. This represents the situation that users use a PC cluster in their department but the PC cluster has its own firewall. Here, the allocation of processes is the same as the LAN except PC2 needs to communicate with other computers via a firewall by ssh tunneling. Finally, WAN+ssh indicates execution time where a master/supervisor process runs on a computer in UCSD and others run on computers in TITECH. All computers in TITECH are installed within a firewall. The last case shows the situation that a user uses computing resources on a remote site. Here, a master/supervisor process runs on PC1 and the other processes are allocated as the same way as LAN, and PC1 communicates with computers in TITECH by ssh tunneling. The results on Figure 2 and 3 show that the masterworker paradigm degrades performance in LAN+ssh and WAN+ssh while the hierarchical master-worker paradigm sustains almost the same performance in all the cases. Particularly, performance degradation of mw in WAN+ssh is significant. In this case, execution of the helicopter control problem and the synthetic problem did not finish within 10 minutes and within one hour, respectively. The reason for the performance degradation is high communication overhead, which is caused by high communication latency on WAN and by overhead of ssh tunneling. Table 2 presents overhead for invoking a single Ninf call() from a master process to a worker process. The result on the table shows high communication overhead in LAN+ssh and WAN+ssh. As described in Section 3.1.1, execution time of a single task in this application is small. In the experiment, the measured mean execution time for a single task is 0.03[sec] and 0.52[sec] for the helicopter control problem and the synthetic problem, respectively. Thus, the communication overhead is relatively large compared to the task execution time. Particularly, the communication overhead is significantly higher than the task execution time in WAN+ssh. For a breakdown of the communication overhead, high overhead caused by ssh tunneling is observed by the gap between LAN and LAN+ssh on Table 2. Also, the amount of data transferred between a master process and a worker process for execution of a single task is small in this application. The size of data transferred from a master process to a worker process is 3829[Bytes] and [Bytes] for the helicopter control problem and the synthetic problem, respectively. Those from a worker process to a master process are 720[Bytes] for both. 2 It suggests that ssh tunneling yields significant communication overhead even for an application with small transferred data. 4.2 Evaluation for Scalability Next, this section presents the results for performance scalability of the hierarchical master-worker paradigm. Figure 4 and 5 present performance scalability of the master-worker paradigm and the hierarchical master-worker paradigm. On the Figures, mw indicates execution time of the benchmark problems where a master process runs on PC3 and worker processes run on nodes in PCC1 and 2 The amount of data transferred from a worker process to a master process depends on how many subproblems are pruned on the worker processes. The result on Table 2 shows overhead in the worst case, that is, overhead when no subproblems are pruned. 6
7 execution time [sec] mw hmw(m=1) hmw(m=2) hmw(m=3) the number of workers Figure 4. Performance scalability for the helicopter control problem execution time [sec] mw hmw(m=1) hmw(m=2) hmw(m=3) the number of workers Figure 5. Performance scalability for the synthetic problem PCC2. Also, hmw means execution time where a supervisor process runs on PC3 and master/worker processes run on nodes in PCC1 and PCC2. The value of m in the parenthesis indicates the number of master processes in the hierarchical master-worker paradigm, where worker processes are equally divided among master processes. All computing resources are installed in TITECH, and processes are able to communicate with others without ssh tunneling. The results on Figure 4 and 5 show that the masterworker paradigm degrades performance on the large number of worker processes while the hierarchical masterworker paradigm improves performance. The performance gap between mw and hmw(m=1) for the helicopter control problem is caused by communication overhead. A master process and worker processes running on nodes in PCC1 communicate through a single network switch in hmw(m=1), while a master process and worker processes communicate via multiple network switches in mw. The measured ping latency between computing nodes in the former case is 0.1[msec], while the latency for the latter case is 0.2[msec]. The performance of the master-worker paradigm for the helicopter control problem is significantly affected even by the small communication overhead, because task granularity is small. Performance for the synthetic problem is not affected by the small communication overhead, because the task granularity is sufficiently large. For the number of master processes in the hierarchical master-worker paradigm, adding a master process improves performance in several cases, e.g. hmw(m=2), hmw(m=3) for the helicopter control problem and hmw(m=2) for the synthetic problem on 48 worker processes. This result means that adding master processes is effective to eliminate performance bottleneck on a master process. However, adding a master process does not improve performance at the execution on 32 worker processes. This result means that adding a master process does not improve performance where a master process still has enough power to control worker processes. Also, for the synthetic problem on 48 worker processes, hmw(m=3) shows worse performance than hmw(m=2). The reason is that two master processes are enough to control 48 worker processes for the synthetic problem, because the task granularity is larger than that in the helicopter control problem. 5 Related Work The master-worker paradigm on a Grid has been discussed in many literatures. The AppLes project discusses a problem how to determine placement of a master process and worker processes on computing resources[12]. The work presented in [7] discusses a problem to define the number of worker processes to be allocated to a masterworker application and proposes a strategy to adjust the number of worker processes allocated to an application adaptively during its execution. Parallel branch and bound algorithms with the masterworker paradigm are addressed in [6, 10, 13]. The MW[6] and Javelin 3[10] provide software frameworks to implement applications with the master-worker paradigm and parallel branch and bound applications are implemented on these frameworks. The work presented in [13] presents experimental results of a parallel branch and bound algorithm to solve the knapsack problem using the Grid RPC middleware, Ninf. The hierarchical master-worker paradigm is supported in ATLAS[3], Satin[14] and AMWAT[2]. The work presented in [14] shows comparison of load balancing algorithms on 7
8 a hierarchical master-worker setting. The AMWAT provides a software template to implement a parallel application with the (hierarchical) master-worker paradigm. However, the detailed discussion about the impact of the hierarchical master-worker paradigm on performance, which this paper presented, has not been reported in these literatures. 6 Conclusions This paper proposed a parallel branch and bound algorithm to solve an optimization problem, namely the BMI Eigenvalue Problem, with the hierarchical master-worker paradigm on a distributed computing system, and compared its performance with the conventional master-worker paradigm on a Grid test bed. The results show that computation with the conventional master-worker paradigm is not suitable to efficiently solve the optimization problem with fine grain tasks on the WAN setting, because communication overhead is too high compared with the cost of tasks. Also, a performance bottleneck on a single master process degrades performance in the master-worker paradigm even on the LAN setting. The hierarchical master-worker paradigm avoids performance degradation caused by high communication overhead by putting frequent communication between a master process and worker processes in tightly coupled computing resources. It also eliminates a performance bottleneck on a master process and improves performance scalability by distributing work among multiple master processes. The hierarchical master-worker paradigm is necessary to achieve satisfactory performance for an application to solve optimization problem with fine grain tasks, such as the BMI Eigenvalue Problem, on the WAN setting, where multiple PC clusters are connected via WAN through firewalls. Even on the LAN setting, the hierarchical masterworker paradigm improves performance scalability, while we need to define appropriate parameters, e.g. the number of master/worker processes, to achieve the best performance. The application evaluated in this paper performs a parallel branch and bound algorithm for a problem with fine grain tasks. Thus, we believe that the results in this paper are applicable to other parallel branch and bound applications with fine grain tasks. For performance improvement, we need more sophisticated algorithms to define some parameters in the proposed algorithm, e.g. an algorithm to define the number of master/worker processes, a load balancing algorithm among masters. Development of these algorithms to achieve the best performance is our future work. acknowledgments We would like to sincerely thank Dr. Henri Casanova and the Global Scientific Information and Computing Center at the Tokyo Institute of Technology for allowing us to use their computing resource for our experiments. We also thank Prof. Shinji Hara and the staffs of the Ninf project for their insightful comments. References [1] K. Aida, Y. Futakata, and S. Hara. High-performance parallel and distributed computing for the bmi eigenvalue problem. In Proc. The 16th IEEE International Parallel and Distributed Processing Symposium, [2] AppLeS Master Worker Application Template (AMWAT). [3] J. E. Baldeschwieler, R. D. Blumofe, and E. A. Brewer. AT- LAS: An Infrastructure for Global Computing. In Proc. of the 1996 SIGOPS European Workshop, [4] H. Fujioka and K. Hoshijima. Bounds for the bmi eigenvalue problem. Trans. of the Society of Instrument and Control Engineers, 33(7): , [5] M. Fukuda and M. Kojima. Branch-and-cut algorithms for the bilinear matrix inequality eigenvalue problem. Computational Optimization and Applications, 19(1):79 105, [6] J. Goux, S. Kulkarni, J. Linderoth, and M. Yoder. An enabling framework for master-worker applications on the computational grid. In Proc. the 9th IEEE Symposium on High Performance Distributed Computing (HPDC9). [7] E. Heymann, M. A. Senar, E. Luque, and M. Livny. Adaptive scheduling for master-worker applications on the computational grid. In Proc. of the 1st IEEE/ACM International Workshop on Grid Computing (Grid2000), [8] L. H. Keel, S. P. Bhattachayya, and J. W. Howze. Robust contorl with structured perturbations. IEEE Trans. on Auto. Contr., 33(1):68 78, [9] S. Matsuoka, H. Nakada, M. Sato, and S. Sekiguchi. Design issues of Network Enabled Server Systems for the Grid. In Grid Computing GRID 2000, Lecture Notes in Computer Science 1971, pages pp Springer-Verlag, [10] M. O. Neary and P. Cappello. Advanced Eager Scheduling for Java-Based Adaptively Parallel Computing. In Proc. of the 2002 joint ACM-ISCOPE conference on Java Grande, [11] Ninf: A Global Computing Infrastructure. [12] G. Shao, F. Berman, and R. Wolski. Master/slave computing on the grid. In Proc. of Heterogeneous Computing Workshop, [13] Y. Tanaka, M. Sato, M. Hirano, H. Nakada, and S. Sekiguchi. Performance evaluation of a firewallcompliant globus-based wide-area cluster system. In Proc. of 9th IEEE Symposium on High-Performance Distributed Computing, [14] R. V. van Nieuwpoort, T. Kelmann, and H. E. Bal. Efficient Load Balancing for Wide-Area Divide-and-Conquer Applications. In Proc. of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, pages pp.34 43,
A Case Study in Running a Parallel Branch and Bound Application on the Grid
A Case Study in Running a Parallel Branch and Bound Application on the Grid Kento Aida Tokyo Institute of Technology/PRESTO, JST aida@alab.ip.titech.ac.jp Tomotaka Osumi Tokyo Institute of Technology osumi@alab.ip.titech.ac.jp
More informationOmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP
OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP (extended abstract) Mitsuhisa Sato 1, Motonari Hirano 2, Yoshio Tanaka 2 and Satoshi Sekiguchi 2 1 Real World Computing Partnership,
More informationStatistical Approach to Optimize Master-Worker Allocation in Grid Computing
Statistical Approach to Optimize Master-Worker Allocation in Grid Computing S. R. Kodituwakku 1 and H. R. O. E. Dhayarathne 2 Department of Statistics and Computer Science Faculty of Science University
More informationScalable Distributed Depth-First Search with Greedy Work Stealing
Scalable Distributed Depth-First Search with Greedy Work Stealing Joxan Jaffar Andrew E. Santosa Roland H.C. Yap Kenny Q. Zhu School of Computing National University of Singapore Republic of Singapore
More informationAPAN Conference 2000, Beijing. Ninf Project
APAN Conference 2000, Beijing Ninf Project Kento Aida(4), Atsuko Takefusa(4), Hirotaka Ogawa(4), Osamu Tatebe(1), Hidemoto Nakada(1), Hiromitsu Takagi(1), Yoshio Tanaka (1), Satoshi Matsuoka(4), Mitsuhisa
More informationStorage access optimization with virtual machine migration during execution of parallel data processing on a virtual machine PC cluster
Storage access optimization with virtual machine migration during execution of parallel data processing on a virtual machine PC cluster Shiori Toyoshima Ochanomizu University 2 1 1, Otsuka, Bunkyo-ku Tokyo
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationImplementing Scalable Parallel Search Algorithms for Data-Intensive Applications
Implementing Scalable Parallel Search Algorithms for Data-Intensive Applications Ted Ralphs Industrial and Systems Engineering Lehigh University http://www.lehigh.edu/~tkr2 Laszlo Ladanyi IBM T.J. Watson
More informationUsing TOP-C and AMPIC to Port Large Parallel Applications to the Computational Grid
Using TOP-C and AMPIC to Port Large Parallel Applications to the Computational Grid Gene Cooperman Henri Casanova Jim Hayes Thomas Witzel College of Computer Science, Northeastern University. {gene,twitzel}@ccs.neu.edu
More informationSolving Large Scale Optimization Problems via Grid and Cluster Computing
Solving Large Scale Optimization Problems via Grid and Cluster Computing Katsuki Fujisawa, Masakazu Kojima Akiko Takeda and Makoto Yamashita Abstract. Solving large scale optimization problems requires
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationHigh Performance Grid and Cluster Computing for Some Optimization Problems
Research Reports on Mathematical and Computing Sciences Series B : Operations Research Department of Mathematical and Computing Sciences Tokyo Institute of Technology 2-12-1 Oh-Okayama, Meguro-ku, Tokyo
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this
More informationWide-area Cluster System
Performance Evaluation of a Firewall-compliant Globus-based Wide-area Cluster System Yoshio Tanaka 3, Mitsuhisa Sato Real World Computing Partnership Mitsui bldg. 14F, 1-6-1 Takezono Tsukuba Ibaraki 305-0032,
More informationOnline Optimization of VM Deployment in IaaS Cloud
Online Optimization of VM Deployment in IaaS Cloud Pei Fan, Zhenbang Chen, Ji Wang School of Computer Science National University of Defense Technology Changsha, 4173, P.R.China {peifan,zbchen}@nudt.edu.cn,
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationParallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming
Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Fabiana Leibovich, Laura De Giusti, and Marcelo Naiouf Instituto de Investigación en Informática LIDI (III-LIDI),
More informationDistributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5
Distributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5 Paper IEEE Computer (May 2016) What is DAS? Distributed common infrastructure for Dutch Computer Science Distributed: multiple (4-6) clusters
More informationProfiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Motivation Scientific experiments are generating large amounts of data Education
More informationPerformance of Multicore LUP Decomposition
Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations
More informationHigh Throughput WAN Data Transfer with Hadoop-based Storage
High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San
More informationParallel Matrix Multiplication on Heterogeneous Networks of Workstations
Parallel Matrix Multiplication on Heterogeneous Networks of Workstations Fernando Tinetti 1, Emilio Luque 2 1 Universidad Nacional de La Plata Facultad de Informática, 50 y 115 1900 La Plata, Argentina
More informationpage migration Implementation and Evaluation of Dynamic Load Balancing Using Runtime Performance Monitoring on Omni/SCASH
Omni/SCASH 1 2 3 4 heterogeneity Omni/SCASH page migration Implementation and Evaluation of Dynamic Load Balancing Using Runtime Performance Monitoring on Omni/SCASH Yoshiaki Sakae, 1 Satoshi Matsuoka,
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationScalable Distributed Depth-First Search with Greedy Work Stealing
Scalable Distributed Depth-First Search with Greedy Work Stealing Joxan Jaffar, Andrew E. Santosa, Roland H.C. Yap, and Kenny Q. Zhu School of Computing National University of Singapore Republic of Singapore
More informationParallel and Distributed Optimization with Gurobi Optimizer
Parallel and Distributed Optimization with Gurobi Optimizer Our Presenter Dr. Tobias Achterberg Developer, Gurobi Optimization 2 Parallel & Distributed Optimization 3 Terminology for this presentation
More informationCharacterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date:
Characterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date: 8-17-5 Table of Contents Table of Contents...1 Table of Figures...1 1 Overview...4 2 Experiment Description...4
More informationLecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter
Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)
More informationA Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin
A Decoupled Scheduling Approach for the GrADS Program Development Environment DCSL Ahmed Amin Outline Introduction Related Work Scheduling Architecture Scheduling Algorithm Testbench Results Conclusions
More informationQoS-constrained List Scheduling Heuristics for Parallel Applications on Grids
16th Euromicro Conference on Parallel, Distributed and Network-Based Processing QoS-constrained List Scheduling Heuristics for Parallel Applications on Grids Ranieri Baraglia, Renato Ferrini, Nicola Tonellotto
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationA NUMA Aware Scheduler for a Parallel Sparse Direct Solver
Author manuscript, published in "N/P" A NUMA Aware Scheduler for a Parallel Sparse Direct Solver Mathieu Faverge a, Pierre Ramet a a INRIA Bordeaux - Sud-Ouest & LaBRI, ScAlApplix project, Université Bordeaux
More informationParallel Interval Analysis for Chemical Process Modeling
arallel Interval Analysis for Chemical rocess Modeling Chao-Yang Gau and Mark A. Stadtherr Λ Department of Chemical Engineering University of Notre Dame Notre Dame, IN 46556 USA SIAM CSE 2000 Washington,
More informationAccelerating Parameter Sweep Applications Using CUDA
2 9th International Euromicro Conference on Parallel, Distributed and Network-Based Processing Accelerating Parameter Sweep Applications Using CUDA Masaya Motokubota, Fumihiko Ino and Kenichi Hagihara
More informationThe latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication
The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University
More informationScalable Performance Analysis of Parallel Systems: Concepts and Experiences
1 Scalable Performance Analysis of Parallel Systems: Concepts and Experiences Holger Brunst ab and Wolfgang E. Nagel a a Center for High Performance Computing, Dresden University of Technology, 01062 Dresden,
More informationCh. 7: Benchmarks and Performance Tests
Ch. 7: Benchmarks and Performance Tests Kenneth Mitchell School of Computing & Engineering, University of Missouri-Kansas City, Kansas City, MO 64110 Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 1/3 Introduction
More informationLayer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints
Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Jörg Dümmler, Raphael Kunis, and Gudula Rünger Chemnitz University of Technology, Department of Computer Science,
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationTask Allocation for Minimizing Programs Completion Time in Multicomputer Systems
Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Gamal Attiya and Yskandar Hamam Groupe ESIEE Paris, Lab. A 2 SI Cité Descartes, BP 99, 93162 Noisy-Le-Grand, FRANCE {attiyag,hamamy}@esiee.fr
More informationEngineering shortest-path algorithms for dynamic networks
Engineering shortest-path algorithms for dynamic networks Mattia D Emidio and Daniele Frigioni Department of Information Engineering, Computer Science and Mathematics, University of L Aquila, Via Gronchi
More informationIN5050: Programming heterogeneous multi-core processors Thinking Parallel
IN5050: Programming heterogeneous multi-core processors Thinking Parallel 28/8-2018 Designing and Building Parallel Programs Ian Foster s framework proposal develop intuition as to what constitutes a good
More informationHierarchical Chubby: A Scalable, Distributed Locking Service
Hierarchical Chubby: A Scalable, Distributed Locking Service Zoë Bohn and Emma Dauterman Abstract We describe a scalable, hierarchical version of Google s locking service, Chubby, designed for use by systems
More informationDesign of Parallel Algorithms. Models of Parallel Computation
+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes
More informationUsing implicit fitness functions for genetic algorithm-based agent scheduling
Using implicit fitness functions for genetic algorithm-based agent scheduling Sankaran Prashanth, Daniel Andresen Department of Computing and Information Sciences Kansas State University Manhattan, KS
More informationAnna Morajko.
Performance analysis and tuning of parallel/distributed applications Anna Morajko Anna.Morajko@uab.es 26 05 2008 Introduction Main research projects Develop techniques and tools for application performance
More informationTowards ParadisEO-MO-GPU: a Framework for GPU-based Local Search Metaheuristics
Towards ParadisEO-MO-GPU: a Framework for GPU-based Local Search Metaheuristics N. Melab, T-V. Luong, K. Boufaras and E-G. Talbi Dolphin Project INRIA Lille Nord Europe - LIFL/CNRS UMR 8022 - Université
More informationMONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT
The Monte Carlo Method: Versatility Unbounded in a Dynamic Computing World Chattanooga, Tennessee, April 17-21, 2005, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2005) MONTE CARLO SIMULATION
More informationUni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing
Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Shigeki Akiyama, Kenjiro Taura The University of Tokyo June 17, 2015 HPDC 15 Lightweight Threads Lightweight threads enable
More informationParallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle
Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle Plamenka Borovska Abstract: The paper investigates the efficiency of parallel branch-and-bound search on multicomputer cluster for the
More informationI/O in the Gardens Non-Dedicated Cluster Computing Environment
I/O in the Gardens Non-Dedicated Cluster Computing Environment Paul Roe and Siu Yuen Chan School of Computing Science Queensland University of Technology Australia fp.roe, s.chang@qut.edu.au Abstract Gardens
More informationBalancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation
Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation Hui Wang, Peter Varman Rice University FAST 14, Feb 2014 Tiered Storage Tiered storage: HDs and SSDs q Advantages:
More informationMetaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini
Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution
More informationCS6401- Operating System QUESTION BANK UNIT-I
Part-A 1. What is an Operating system? QUESTION BANK UNIT-I An operating system is a program that manages the computer hardware. It also provides a basis for application programs and act as an intermediary
More informationPosition Paper: OpenMP scheduling on ARM big.little architecture
Position Paper: OpenMP scheduling on ARM big.little architecture Anastasiia Butko, Louisa Bessad, David Novo, Florent Bruguier, Abdoulaye Gamatié, Gilles Sassatelli, Lionel Torres, and Michel Robert LIRMM
More informationCOMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction
COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University
More informationCompiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems
Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Florin Balasa American University in Cairo Noha Abuaesh American University in Cairo Ilie I. Luican Microsoft Inc., USA Cristian
More informationDirect Execution of Linux Binary on Windows for Grid RPC Workers
Direct Execution of Linux Binary on Windows for Grid RPC Workers Yoshifumi Uemura, Yoshihiro Nakajima and Mitsuhisa Sato Graduate School of Systems and Information Engineering University of Tsukuba {uemura,
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationUsing Multiple Machines to Solve Models Faster with Gurobi 6.0
Using Multiple Machines to Solve Models Faster with Gurobi 6.0 Distributed Algorithms in Gurobi 6.0 Gurobi 6.0 includes 3 distributed algorithms Distributed concurrent LP (new in 6.0) MIP Distributed MIP
More informationA High Population, Fault Tolerant Parallel Raytracer
A High Population, Fault Tolerant Parallel Raytracer James Skorupski Ben Weber Mei-Ling L. Liu Computer Science Department Computer Science Department Computer Science Department Cal Poly State University
More informationSEDA: An Architecture for Well-Conditioned, Scalable Internet Services
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles
More informationDynamic Balancing Complex Workload in Workstation Networks - Challenge, Concepts and Experience
Dynamic Balancing Complex Workload in Workstation Networks - Challenge, Concepts and Experience Abstract Wolfgang Becker Institute of Parallel and Distributed High-Performance Systems (IPVR) University
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationGRB. Grid-JQA : Grid Java based Quality of service management by Active database. L. Mohammad Khanli M. Analoui. Abstract.
Grid-JQA : Grid Java based Quality of service management by Active database L. Mohammad Khanli M. Analoui Ph.D. student C.E. Dept. IUST Tehran, Iran Khanli@iust.ac.ir Assistant professor C.E. Dept. IUST
More informationNFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC
Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data
More informationOperating System Performance and Large Servers 1
Operating System Performance and Large Servers 1 Hyuck Yoo and Keng-Tai Ko Sun Microsystems, Inc. Mountain View, CA 94043 Abstract Servers are an essential part of today's computing environments. High
More informationParallel Computing in Combinatorial Optimization
Parallel Computing in Combinatorial Optimization Bernard Gendron Université de Montréal gendron@iro.umontreal.ca Course Outline Objective: provide an overview of the current research on the design of parallel
More informationEnergy Conservation In Computational Grids
Energy Conservation In Computational Grids Monika Yadav 1 and Sudheer Katta 2 and M. R. Bhujade 3 1 Department of Computer Science and Engineering, IIT Bombay monika@cse.iitb.ac.in 2 Department of Electrical
More informationAn Efficient Load-Sharing and Fault-Tolerance Algorithm in Internet-Based Clustering Systems
An Efficient Load-Sharing and Fault-Tolerance Algorithm in Internet-Based Clustering Systems In-Bok Choi and Jae-Dong Lee Division of Information and Computer Science, Dankook University, San #8, Hannam-dong,
More informationTechnical Brief: Specifying a PC for Mascot
Technical Brief: Specifying a PC for Mascot Matrix Science 8 Wyndham Place London W1H 1PP United Kingdom Tel: +44 (0)20 7723 2142 Fax: +44 (0)20 7725 9360 info@matrixscience.com http://www.matrixscience.com
More informationParallelization of Graph Isomorphism using OpenMP
Parallelization of Graph Isomorphism using OpenMP Vijaya Balpande Research Scholar GHRCE, Nagpur Priyadarshini J L College of Engineering, Nagpur ABSTRACT Advancement in computer architecture leads to
More informationPARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures *
PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt Center for Embedded Computer Systems
More informationA Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme
A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme Yue Zhang 1 and Yunxia Pei 2 1 Department of Math and Computer Science Center of Network, Henan Police College, Zhengzhou,
More informationOn Cluster Resource Allocation for Multiple Parallel Task Graphs
On Cluster Resource Allocation for Multiple Parallel Task Graphs Henri Casanova Frédéric Desprez Frédéric Suter University of Hawai i at Manoa INRIA - LIP - ENS Lyon IN2P3 Computing Center, CNRS / IN2P3
More informationA paralleled algorithm based on multimedia retrieval
A paralleled algorithm based on multimedia retrieval Changhong Guo Teaching and Researching Department of Basic Course, Jilin Institute of Physical Education, Changchun 130022, Jilin, China Abstract With
More informationQoS-aware resource allocation and load-balancing in enterprise Grids using online simulation
QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation * Universität Karlsruhe (TH) Technical University of Catalonia (UPC) Barcelona Supercomputing Center (BSC) Samuel
More informationJob Re-Packing for Enhancing the Performance of Gang Scheduling
Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT
More informationPredicting the response time of a new task on a Beowulf cluster
Predicting the response time of a new task on a Beowulf cluster Marta Beltrán and Jose L. Bosque ESCET, Universidad Rey Juan Carlos, 28933 Madrid, Spain, mbeltran@escet.urjc.es,jbosque@escet.urjc.es Abstract.
More informationScalable GPU Graph Traversal!
Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang
More informationApplying the Component Paradigm to AUTOSAR Basic Software
Applying the Component Paradigm to AUTOSAR Basic Software Dietmar Schreiner Vienna University of Technology Institute of Computer Languages, Compilers and Languages Group Argentinierstrasse 8/185-1, A-1040
More informationEfficiency Evaluation of the Input/Output System on Computer Clusters
Efficiency Evaluation of the Input/Output System on Computer Clusters Sandra Méndez, Dolores Rexachs and Emilio Luque Computer Architecture and Operating System Department (CAOS) Universitat Autònoma de
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationA Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme
A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme Yue Zhang, Yunxia Pei To cite this version: Yue Zhang, Yunxia Pei. A Resource Discovery Algorithm in Mobile Grid Computing
More informationNetwork Design Considerations for Grid Computing
Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom
More informationChapter 2 System Models
CSF661 Distributed Systems 分散式系統 Chapter 2 System Models 吳俊興國立高雄大學資訊工程學系 Chapter 2 System Models 2.1 Introduction 2.2 Physical models 2.3 Architectural models 2.4 Fundamental models 2.5 Summary 2 A physical
More informationHomogenization: A Mechanism for Distributed Processing across a Local Area Network
Homogenization: A Mechanism for Distributed Processing across a Local Area Network Mahmud Shahriar Hossain Department of Computer Science and Engineering, Shahjalal University of Science and Technology,
More information2. Modeling AEA 2018/2019. Based on Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice - ch. 2
2. Modeling AEA 2018/2019 Based on Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice - ch. 2 Content Introduction Modeling phases Modeling Frameworks Graph Based Models Mixed
More informationComputer-System Organization (cont.)
Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,
More informationAdvanced School in High Performance and GRID Computing November Introduction to Grid computing.
1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste
More informationA PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D
3 rd International Symposium on Impact Engineering 98, 7-9 December 1998, Singapore A PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D M.
More informationAchieving Distributed Buffering in Multi-path Routing using Fair Allocation
Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Ali Al-Dhaher, Tricha Anjali Department of Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois
More informationA Parallel Macro Partitioning Framework for Solving Mixed Integer Programs
This research is funded by NSF, CMMI and CIEG 0521953: Exploiting Cyberinfrastructure to Solve Real-time Integer Programs A Parallel Macro Partitioning Framework for Solving Mixed Integer Programs Mahdi
More informationChapter 1: Introduction
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real -Time Systems Handheld Systems Computing Environments
More informationLECTURE 3:CPU SCHEDULING
LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives
More informationA Framework for Opportunistic Cluster Computing using JavaSpaces 1
A Framework for Opportunistic Cluster Computing using JavaSpaces 1 Jyoti Batheja and Manish Parashar Electrical and Computer Engineering, Rutgers University 94 Brett Road, Piscataway, NJ 08854 {jbatheja,
More informationWeb page recommendation using a stochastic process model
Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,
More informationJULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING
JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338
More information