The load balancing problem in OTIS-Hypercube interconnection networks

Size: px
Start display at page:

Download "The load balancing problem in OTIS-Hypercube interconnection networks"

Transcription

1 J Supercomput (2008) 46: DOI /s The load balancing problem in OTIS-Hypercube interconnection networks Basel A. Mahafzah Bashira A. Jaradat Published online: 8 March 2008 Springer Science+Business Media, LLC 2008 Abstract An interconnection network architecture that promises to be an interesting option for future-generation parallel processing systems is the OTIS (Optical Transpose Interconnection System) optoelectronic architecture. Therefore, all performance improvement aspects of such a promising architecture need to be investigated; one of which is load balancing technique. This paper focuses on devising an efficient algorithm for load balancing on the promising OTIS-Hypercube interconnection networks. The proposed algorithm is called Clusters Dimension Exchange Method (CDEM). The analytical model and the experimental evaluation proved the excellence of OTIS-Hypercube compared to Hypercube in terms of various parameters, including execution time, load balancing accuracy, number of communication steps, and speed. Keywords Load balancing OTIS OTIS-Hypercube Hypercube Interconnection networks 1 Introduction A potential optoelectronic architecture, known as the Optical Transpose Interconnection System (OTIS), was first proposed by Marsden et al. [1]. Recently, the OTIS B.A. Mahafzah ( ) Department of Computer Science, King Abdullah II School for Information Technology, The University of Jordan, Amman 11942, Jordan b.mahafzah@ju.edu.jo B.A. Jaradat Department of Computer Science, School of Computer and Information Technology, Jordan University of Science and Technology, Irbid 22110, Jordan basheera@just.edu.jo

2 The load balancing problem in OTIS-Hypercube interconnection 277 architecture has gained a considerable attention and significant efforts have been employed in studying and improving several aspects of OTIS networks. Many results and research work exist in the literature regarding the OTIS optoelectronic architecture [2 8]. A few have addressed the performance issue of such interconnection networks [2, 5]. In addition to all the existing studies and research work, complementary efforts must be made in order to attain the best possible performance; one way is the effective utilization of available resources that can be carried out through the load balancing technique. The significance of load balancing lies in its effect on improving the speedup in processing time, which is a major objective in all parallel processing systems. Based on this fact, this research is dedicated toward studying and solving the load balancing problem on OTIS-Hypercube interconnection networks, with the aspiration of devising an efficient solution that will have a valuable contribution to the enhancement of OTIS-Hypercube interconnection networks performance. Another significant motivational factor to conduct this research is extracted from the following fact reported by leading scientists: With optical elements...light does magic [9]. Due to the miraculous power of light in achieving high-speed communications, the hopes are hanged on achieving high-speed parallel processing on the promising OTIS architecture. Therefore, all aspects of performance improvement on this promising architecture should be studied and evaluated, one of which is the load balancing problem, which is studied on OTIS-Hypercube systems in this paper. The proposed method is called Clusters Dimension Exchange Method (CDEM), which is based on the well-known Dimension Exchange Method (DEM) for load balancing on Hypercube interconnection networks. The efficiency of the proposed algorithm is shown, and the superiority of OTIS architecture is approved. The rest of this paper is organized as follows: Sect. 2 introduces OTIS systems and presents the load balancing techniques applied on both OTIS and Hypercube interconnection networks. The proposed load balancing methodology (CDEM) on OTIS-Hypercube is illustrated in Sect. 3, which also identifies the DEM on Hypercube, based on which CDEM is devised. In Sect. 4, analytical models of both, CDEM on OTIS-Hypercube and DEM on Hypercube, have been presented. The analysis involved the estimation of several performance metrics including: the worst case time complexity, the load balancing accuracy, the maximum number of communication steps, and the speed at which the load balancing process occurs. The analytical estimation has been approved in Sect. 5 through an experimental work that measured the metrics studied in Sect. 4. A comparison is conducted between the attained results of CDEM on OTIS-Hypercube and DEM on Hypercube. Section 6 concludes the paper by discussing the conclusions and proposed future work. 2 Background and related work This section introduces OTIS interconnection networks and presents the research work related to load balancing on Hypercube, which is the basis network from which OTIS-Hypercube under study is constructed.

3 278 B.A. Mahafzah, B.A. Jaradat 2.1 OTIS interconnection networks In an OTIS system, processors are clustered into groups, where processors within the same group are connected by electronic intra-group links forming an interconnection topology known as the factor network, whereas intergroup processors are interconnected by transposing processors and groups addresses, such that processor p of group g is interconnected to processor g of group p. The latter interconnection is achieved optically using the free space optical technology [1]. The factor network can be any of the traditional interconnection networks, such as the Hypercube, which is an interesting interconnection network topology, such that P processors in a Hypercube are organized in log 2 P dimensions, where only two nodes are connected along each dimension. Two nodes in a Hypercube are connected if the hamming distance between the binary representations of their processors numbers is one. Each of the groups in Fig. 1 is a 2-dimensional Hypercube. Krishnamoorthy et al. have shown that the bandwidth and the power consumption in OTIS are optimized when the number of groups is equal to the number of processors in each group [10]. This means that an optimal N 2 -processor OTIS system consists of N groups, each of which contains N processors. For each known topology, an OTIS network can be constructed. For example, an N 2 OTIS-Hypercube network can be formed from N copies of an N-processor Hypercube. An instance of the OTIS-Hypercube interconnection network is a 16-processor (4 groups with 4 processors within each group) OTIS-Hypercube shown in Fig. 1, where an optical intergroup link (distinguished by a dashed link) interconnects processor p of group g to processor g of group p, and processors within the same group are connected by electronic intra-group links forming the Hypercube factor network. Each processor in a Hypercube is marked by a two-parameter label, where Fig processor OTIS-Hypercube

4 The load balancing problem in OTIS-Hypercube interconnection 279 the first parameter indicates the group to which the processor belongs, and the second parameter represents the processor s position within the group. In terms of hardware implementation, OTIS promises to provide large-scale systems that are not possible with traditional electronic technology. This refers to the hardware limitations to support higher dimensions. In an OTIS system, the same number of processors can be arranged in fewer dimensions. In terms of performance, the OTIS interconnection network architecture is desirable due to its recursive structure that consists of multiple similar networks, which provides better support for several features, such as modularity, load balancing, fault tolerance, and robustness [5]. The attractive outcomes of the research on OTIS revealed its ability to achieve Terra bit throughput at a reasonable cost. Based on this, several research efforts have been employed toward studying OTIS and investigating its usefulness for real-life applications [1 8]. Researchers had followed distinct directions in exploring performance issues regarding the OTIS interconnection networks. An important work that has been conducted for evaluating the performance of OTIS was recently published in a M.S. thesis by Najaf-Abadi [2]; it is a valuable research because it concentrated on the performance evaluation and modeling of OTIS networks under important parameters, such as the network bandwidth and message latency. Various algorithms have been developed on OTIS. For instance, Wang and Sahni presented matrix multiplication on OTIS-Mesh [6], and BPC permutations on OTIS- Hypercube [7]. Rajasekeran and Sahni introduced randomized routing, selection, and sortingonotis-mesh[8]. Several other algorithm development efforts have been accomplished on several OTIS instances [3, 4]. 2.2 The load balancing problem One of the most important problems for a parallel processing system is load balancing. The load balancing problem has been studied using different approaches on various networks. Recently, Zhao, Xiao, and Qin proposed hybrid schemes of diffusion and dimension exchange called DED-X for load balancing on OTIS network [11]. The core of DED-X is to divide the load balancing process into three stages by a process of Diffusion-Exchange-Diffusion, where a traditional diffusion scheme, X (such as First Order Scheme (FOS), Second Order Scheme (SOS), and Optimal (OPT)), is applied on various stages of the load balancing process to achieve load balancing on OTIS factor networks [11]. The simulation results of the proposed schemes have shown significant promotion in efficiency and stability [11]. The same authors have generalized, in another work, several DED-X schemes for load balancing on homogeneous OTIS to produce the generalized Diffusion-Exchange-Diffusion schemes, GDED-X, to achieve load balancing on Heterogeneous OTIS networks [12]. The usability of the proposed schemes was shown theoretically and experimentally to be better than traditional X schemes for load balancing on heterogeneous OTIS networks [12]. Ranka, Won, and Sahni [13] introduced the Dimension Exchange Method (DEM) on Hypercube interconnection networks. It is a simple heuristic method that is based

5 280 B.A. Mahafzah, B.A. Jaradat on averaging over loads of directly connected processors, where for each dimension d, every two processors connected along the dth dimension exchange their loads sizes, and according to the average, the processor with excess load transfers the amount of extra load to its neighbor. The advantage of DEM is that every processor can redistribute tasks to its neighbor processors without the information of global distributions of tasks. However, the worst case error in this method is log 2 P on a P -processor Hypercube, where the error is defined as the difference between the maximum and the minimum number of tasks assigned to processors. Error reduction was the objective of several next researches. Better results were achieved by Rim et al. [14, 15], where they adapted DEM to perform efficient dynamic load balancing on Hypercube interconnection networks through proposing a new method, the odd even method that reduces the nonuniformity to no more than 1/2log 2 P. Additional advantages are achieved by introducing new techniques for hiding communication overheads involved in load balancing [15]. Jan and Hwang suggested an efficient algorithm for perfect load balancing on Hypercube multiprocessors, based on the well-known DEM [16]. 3 The clusters dimension exchange method for load balancing on OTIS-Hypercube The proposed clusters dimension exchange method for load balancing on OTIS- Hypercube is based on the well-known Dimension Exchange Method (DEM) for load balancing on Hypercube. DEM balances the processors loads in log 2 P phases for a P -processor Hypercube, organized in log 2 P dimensions. This is accomplished by going through all dimensions and balancing processors loads by redistributing the tasks among directly connected processors in each dimension. Figure 2 illustrates the DEM steps for load balancing on 4-D Hypercube of 16 processors. The processors along the first dimension exchange their loads sizes, where the processor with the higher load transfers the amount of excess load to the lower-loaded processor; the transfer direction is shown by arrows between processors, as Fig. 2b shows. The same steps are performed between processors connected along the second, third, and fourth dimensions, as shown in Figs. 2c, d, and e, respectively. The DEM will be used for devising a new method for load balancing on OTIS- Hypercube interconnection networks, which is called the Clusters Dimension Exchange Method (CDEM). The dynamic load balancing problem on OTIS-Hypercube can be stated as follows: Given an OTIS-Hypercube of P processors, clustered into sqrt(p ) groups, the dynamic load balancing problem is to obtain an exactly or approximately equal load distribution among the OTIS-Hypercube s processors. For a P -processor OTIS-Hypercube, the proposed strategy balances the processors loads in log P phases. The main concept of the proposed method is based on obtaining an equal load distribution among the groups at first by redistributing the loads among the groups so that all groups have an exactly or approximately the same total load. Then each group balances its processors loads so that all processors loads are equal or approximately equal. Figure 3 presents the proposed CDEM algorithm for load balancing on a P -processor OTIS-Hypercube.

6 The load balancing problem in OTIS-Hypercube interconnection 281 Fig. 2 A running example of DEM on 4-D Hypercube of 16 processors: a initial state; b load balancing phase 1; c load balancing phase 2; d load balancing phase 3; e load balancing phase 4 CDEM performs load balancing in a number of phases equal to the basis network s dimension (Fig. 3, line 1). All pairs of groups, whose numbers differ in the dth bit position (line 2), perform the following steps in parallel: Step 1: Groups total loads sizes exchange The groups, whose numbers differ in the dth bit position, exchange their total loads sizes through the optical interconnection, as indicated by line 4 in the algorithm (Fig. 3).

7 282 B.A. Mahafzah, B.A. Jaradat Fig. 3 The CDEM algorithm for load balancing on OTIS-Hypercube Step 2: Groups average total load calculation Each pair of the groups in step 1 computes the average of their total loads by calculating the floor of (the sum of the groups total load divided by two) (line 5).

8 The load balancing problem in OTIS-Hypercube interconnection 283 Step 3: Groups total loads redistribution Each group compares its total load to the average load (average of the group s total load and its neighbor group s total load). If the total load of the group is greater than the average load (Fig. 3, line 6), the processor interconnecting the two communicating groups is checked to determine whether it has the sufficient amount of excess load to transfer, which is computed as the difference between the total load at the group and the average of the communicating groups total loads. If it has the required amount of excess load (line 7), it will send it to the neighbor group (line 8), and its load will be adjusted by decrementing the transferred load (line 9), and the group s total load is adjusted to be equal to the average load (line 10). On the other hand, if that processor does not have the sufficient amount of load to transfer (line 11), it will request the additional required load from its neighbors (line 12), and add it to its load (line 13). If the total load of the group is less than the average load (Fig. 3, line 17), the group will receive its neighbor group s excess load (line 18), and the load of the group s processors interconnecting the two groups will be adjusted by incrementing the transferred load (line 20), and the group s total load is adjusted to have an additional load equal to the transferred load (line 21). Since all the groups have the same amount of workload units, balancing the processors loads within each group, using the DEM presented at the beginning of this section, will produce a completely balanced network, with all the processors having the same or approximately the same amount of workload. The load balancing procedure iterates through each of the Hypercube s dimensions (Fig. 3, line 22). All the directly connected processors along the first dimension are balanced in the first phase. All pairs of processors, whose binary representation differ in the dth bit (lines 23 25), perform load balancing within each group by performing the following steps in parallel: Step 1: Processors loads sizes exchange All directly connected processors along the dth dimension exchange their loads sizes (Fig. 3, line 26). Step 2: Processors average load calculation The average load is computed for each pair of processors that are directly connected along the dth dimension as the floor of (the sum of the processors loads divided by two) (line 27). Step 3: Processors loads redistribution First, each processor compares its load to the average load (average of the processor s load and its neighbor processor s load). If the processor s load is greater than the average load (line 28), the processor sends the amount of excess load (processor s load minus average load) along the dth dimension (lines 29 and 30), and the processor s load is adjusted to be equal to the average load (line 31). Otherwise, the processor receives the amount of its neighbor s excess load along the dth dimension (lines 32 35), and its load is incremented by the amount of its neighbor s excess load. The proposed CDEM method is illustrated through an example of load balancing on a 16-processor OTIS-Hypercube, shown in Fig. 4, where the 16 processors are clustered into 4 groups, each of which consists of 4 processors. The processors are identified by a two-parameter number; where the first number indicates the group to which the processor belongs, and the second parameter indicates the processor s position within the group. Each processor operates on an assigned load; the number above

9 284 B.A. Mahafzah, B.A. Jaradat Fig processor OTIS-Hypercube Fig processor OTIS-Hypercube (groups load balancing phase 1) each processor indicates the processor s current load. Intragroup processors are connected by electrical links, whereas intergroup processors are interconnected through optical interconnections, shown as dashed lines to be distinguished from electrical links. Next, a complete example shown in Figs. 4 to 8 will illustrate the method s phases while transferring the OTIS-Hypercube to a balanced state. In the first phase, the groups, whose binary bit representations differ by the first bit position, exchange

10 The load balancing problem in OTIS-Hypercube interconnection 285 Fig processor OTIS-Hypercube (groups load balancing phase 2) Fig processor OTIS-Hypercube (processors load balancing phase 1) their total loads sizes, and the excess load is transferred from the higher-loaded group to the lower-loaded group. In the given example, group 0 whose total load is 28 exchanges its total load size with group 2, whose load is 38. Each of the groups computes the average groups load and decides that both, groups 0 and 2, should have an average of 33 workload units. As Fig. 5 shows, group 2 sends the extra 5 workload units through processor (2, 0), which sends 5 units of its load to group 0 that receives the new load units through processor (0, 2), increasing its 10 load units to 15

11 286 B.A. Mahafzah, B.A. Jaradat Fig processor OTIS-Hypercube (processors load balancing phase 2) and decreasing the load of processor (2, 0) from 13 to 8 load units. Simultaneously, groups 1 and 3 exchange their total loads sizes, which are 43 and 27, respectively. So, the groups loads need to be redistributed in order to get an average of 35 load units per group. Therefore, group 1 sends the extra 8 workload units it has through processor (1, 3), which sends 8 units of its load to group 3 that receives the new load units through processor (3, 1). Thus, increasing the 5 load units of processor (3, 1) to 13 and decreasing the 14 load units of processor (1, 3) to 6, as shown in Fig. 5. In the same way, groups total load balancing proceeds between the groups, whose binary bit representation differs in the second bit, as Fig. 6 indicates, where groups total loads sizes are exchanged, and the excess load is transferred from the higherloaded to the lower-loaded group. At the end of phase 2, groups will have the same or approximately the same total load. In order to arrive to a balanced state, processors within each group perform load balancing using DEM. Figure 7 reveals the result of load balancing among the processors connected along the first dimension in each group. A completely balanced state is achieved by performing load balancing along the second dimension, as it is demonstrated in Fig Analytical modeling This section presents the most important parameters that are used to evaluate the performance of a parallel processing system when the proposed load balancing procedure is applied; these parameters include: execution time, load balancing accuracy, number of communication steps, and speed.

12 The load balancing problem in OTIS-Hypercube interconnection Execution time The execution time metric measures the time required to perform the load balancing steps. The worst case time complexity of the DEM method used for load balancing on Hypercube is O(M log 2 P), where M is the maximum load assigned to each processor in a P -processor Hypercube [16]. The worst case time complexity of CDEM, used for load balancing on OTIS-Hypercube is given in Theorem 1. Theorem 1 The worst case time complexity of CDEM for load balancing on OTIS- Hypercube is O (Sqrt(P ) M log 2 P). Proof If each processor has a maximum of M workload units, then each group has a maximum of Sqrt(P ) M workload units. Thus, during the process of balancing the total loads among groups, there are at most (Sqrt(P ) M/2) workload units to be transferred between each pair of groups in each phase. Since the groups total loads balancing is performed in log 2 Sqrt(P ) phases, then the time complexity contributed by the load balancing of the groups total loads is O((Sqrt(P ) M/2) log 2 Sqrt(P )) = O((Sqrt(P ) M/4) log 2 (P )) O(Sqrt(P ) M log 2 P). In addition, during processors load balancing in each group, M/2 workload units at most need to be transferred between every two processors connected along the dimension d. Since the number of dimensions along which the processors are organized is log 2 Sqrt(P ), then the time complexity contributed by processors load balancing is O((M/2) log 2 Sqrt(P )) = O((M/4) log 2 (P )) O(M log 2 P). Thus, the time complexity for the whole algorithm is: O ( Sqrt(P ) M/4log 2 (P ) + M/4log 2 (P ) ) O ( Sqrt(P ) M log 2 P ) 4.2 Load balancing accuracy The load balancing accuracy is determined by the error with which the processors loads are balanced, where the error is defined as the difference between the maximum number of workload units in any processor, and the minimum number of workload units in any other processor. The significance of the error as an evaluation parameter stems from the fact that the increase in error increases the processing time since the processing time in a parallel processing system depends on the time taken by the processor that has more tasks, where all tasks are of the same size. Therefore, reducing the error is the objective of all load balancing algorithms since such a reduction allows all processors to have approximately the same number of tasks. Thus, all processors will finish execution at approximately the same time. The maximum resulting error using the DEM on a Hypercube is expressed as e log 2 P [11]. CDEM, on the other hand, balances the OTIS-Hypercube s processors loads with an error e bounded by log 2 Sqrt(P ). Theorem 2 gives the load balancing accuracy using CDEM on OTIS-Hypercube. Theorem 2 The maximum resulting error e using CDEM on OTIS-Hypercube is e log 2 Sqrt(P ).

13 288 B.A. Mahafzah, B.A. Jaradat Proof Applying CDEM to balance total loads among groups will yield a maximum error e = log 2 Sqrt(P ) since the load balancing among groups is accomplished in log 2 Sqrt(P ) phases, so if the sum of loads at two groups is odd, then one group should have one more unit than the other. This difference can be accumulated to log 2 Sqrt(P ) since the CDEM balances the groups loads in log 2 Sqrt(P ) phases. Such a maximum difference will be redistributed among the processors of the same group, leading to an error e = log 2 Sqrt(P ) in the worst case when balancing the processors loads within each group. 4.3 Number of communication steps The number of communication steps represents the number of steps required by the processors to communicate in order to achieve load balancing. This number depends on the method used, the architecture of the interconnection network, and it may be affected by the initial load distribution. The significance of this metric refers to its effect on the processing time; less processing time can be achieved with a fewer number of communication steps. Therefore, the objective of any load balancing method is to perform efficient load balancing in the least possible number of communication steps. The maximum number of communication steps required by the DEM for achieving load balancing on a P -processor Hypercube is expressed as 3 log 2 P [17]. Approximately, the same number of communication steps is required for a P -processor OTIS-Hypercube, as shown in Theorem 3. Theorem 3 The number of communication steps required by CDEM for load balancing on OTIS-Hypercube is 3log 2 P. Proof The number of communication steps for balancing the total loads among the groups is three steps for each phase; two of these steps are for exchanging loads sizes and one step for transferring excess load between the two groups, whose binary bit representations differ by the bit that represents the load balancing phase number. Since the number of processors in the network is P, then Sqrt(P ) groups are balanced in log 2 Sqrt(P ) phases. So, the number of communication steps is 3 log 2 Sqrt(P ). Also, the number of communication steps for load balancing among the processors in each group for each dimension is three; two steps for exchanging loads sizes and one step for transferring excess load between the two communicating processors in the same dimension. Since the number of processors in the network is P, then Sqrt(P ) processors in each group are organized in log 2 Sqrt(P ) dimensions. Thus, the number of communication steps is 3 log 2 Sqrt(P ). Thus, 3 log 2 Sqrt(P ) steps for load balancing among groups and 3 log 2 Sqrt(P ) steps for load balancing among processors result in a total of 6 log 2 Sqrt(P ) = 3log 2 P communication steps for the whole network. 4.4 Speed The speed at which the load balancing process occurs plays a significant role in improving the system s performance and reducing the processing time. This metric is affected by several factors; the most important of which are the load balancing method applied and the interconnection technology used.

14 The load balancing problem in OTIS-Hypercube interconnection 289 The speed, at which DEM performs load balancing on Hypercube, assuming that the speed of the electrical technology used is 250 Mb/s [18], can be expressed as the speed of links used during communication which is: 3 log 2 P Speed of electrical links = 3 log 2 P 250 Mb/s = 750 log 2 P Mb/s The speed, at which CDEM performs load balancing on OTIS-Hypercube can be expressed as shown in Theorem 4. Theorem 4 The speed, at which CDEM performs load balancing on OTIS-Hypercube, is: 3log 2 Sqrt(P ) Speed of electrical links + 3log 2 Sqrt(P ) Speed of optical interconnection = 3 log 2 Sqrt(P ) 250Mb/s + 3log 2 Sqrt(P ) 2.5Gb/s = 3/2 log 2 P 250 Mb/s + 3/2 log 2 P 2.5Gb/s Proof Since the OTIS-Hypercube interconnection network is a compromise of two links technologies, the speed of each link technology must be taken into account. Thus, the speed of the whole system is the speed of the electrical links passed in addition to the speed of optical links passed, leading to a system s speed of 3 log 2 Sqrt(P ) 250 Mb/s + 3log 2 Sqrt(P ) 2.5 Gb/s= 3/2 log 2 P 250 Mb/s + 3/2 log 2 P 2.5 Gb/s, assuming that the speed of the electrical technology used is 250 Mb/s [18], and the speed of the optical interconnection technology used is 2.5 Gb/s [19]. 4.5 Summary of analytical results The previous analysis proves that compared to Hypercube interconnection networks, OTIS-Hypercube excels in most performance metrics. This excellence is mainly referred to the attractive OTIS architecture, in which the same number of processors can be organized in fewer dimensions, and to the use of the most suitable interconnection technology in the suitable position, where close processors are connected with electrical technology, while processors at larger distances are interconnected optically, achieving higher speed and less power consumption, with several other advantageous features. Table 1 summarizes the evaluated metrics for both; DEM on Hypercube, and CDEM on OTIS-Hypercube. 5 Experimental results and evaluation A significant complementary effort that supports the work accomplished so far is to conduct various experiments to evaluate, compare, and analyze distinct performance

15 290 B.A. Mahafzah, B.A. Jaradat Table 1 Analytical comparison Metric Method DEM on Hypercube CDEM on OTIS-Hypercube Execution time O(M log 2 P) O(Sqrt(P ) M log 2 P) Error e log 2 P e log 2 Sqrt(P ) Communication steps 3 log 2 (P ) 3log 2 (P ) Speed 750 log 2 P Mb/s 3/2 log 2 P 250 Mb/s + 3/2 log 2 P 2.5Gb/s metrics when the proposed load balancing method (Clusters Dimension Exchange Method (CDEM)), on an OTIS-Hypercube is applied. CDEM has been implemented and applied to various sizes of simulated OTIS-Hypercube interconnection networks. Then several experiments have been conducted in order to evaluate various performance metrics, and compare them to the results of applying DEM on Hypercubes of equivalent sizes. The experimental runs of the load balancing methods under study were performed on Dual-Core Intel Xeon Processor (CPU 3.2 GHz) with Hyper-Threading Technology, 2 GB RAM, and 2 MB L2 Cache per CPU. The experimental work was conducted under SUSE Linux 10 operating system. The interconnection network simulation was developed for both OTIS-Hypercube and Hypercube using object-oriented approach of C++ high-level programming language. The following major classes have been defined: The interconnection network class, which constructs the interconnection network of the given size. The processors class, which sets the properties and methods of processors. The link class, which connects processors according to the interconnection network architecture. The simulation starts by constructing the desired interconnection network according to the number of processors determined by the user. The load balancing process is implemented using multithreaded approach, which is used to support parallel execution of the implemented load balancing process. The Pthread library routines are used to create and manage a dynamic number of threads that are used to perform load balancing steps simultaneously. The main functions implemented to achieve load balancing include: load computation, load size exchange, average load calculation, and excess load transfer. This section presents the experimental results of various metrics, including execution time, load balancing accuracy, number of communication steps, and speed. 5.1 Execution time Several experiments have been conducted to compute the time required to execute the proposed load balancing method on 16-, 64-, 256-, and 1024-processor OTIS- Hypercube. In order to study the effect of the load size on the method s execution

16 The load balancing problem in OTIS-Hypercube interconnection 291 Fig. 9 Average execution time using DEM on hypercube and CDEM on OTIS-Hypercube (maximum load size is 500 workload units per processor) time, the experiments have been performed with variable average loads sizes assigned to processors. The loads sizes used are 10, 50, 100, and 500 workload units on average assigned to each processor. The same experiments have been carried out for 16-, 64-, 256-, and 1024-processor Hypercube. An intuitive result is the increased execution time for increasing number of processors. As Fig. 9 shows, the average execution time is few milliseconds for a small number of processors, such as a 16-processor Hypercube and OTIS-Hypercube, while an increased execution time becomes more observable for larger sizes; 256 processors and more. For instance, when the maximum load size is 500 workload units per processor, DEM on 1024-processor Hypercube takes about 292 seconds, whereas CDEM on 1024-processor OTIS-Hypercube consumes around 45 seconds. The previous discussion dealt with the effect of the network s size on the execution time. Now, it is the time to consider the role played by the number of workload units assigned to each processor. The experiments revealed that the number of workload units assigned to processors greatly affects the execution time for a large number of processors. In addition, the execution time difference between Hypercube and OTIS- Hypercube becomes more observable for larger number of processors. Figures 10 and 11 illustrate these conclusions. Figure 10 compares the average execution time taken by DEM to balance a 64- processor Hypercube s load, and the time required by CDEM to achieve load balancing on a 64-processor OTIS-Hypercube. A careful examination of Fig. 10 shows that the greatest execution time difference appears in the cases where the maximum load assigned to each of the networks processors is 500 workload units. Figure 11 clarifies two facts. The first intuitive fact is that increasing the number of processors to 1,024 shows that the average execution time difference between CDEM on OTIS-Hypercube and DEM on Hypercube stays about linear which is about 200 seconds with different number of workload units. The second clear fact

17 292 B.A. Mahafzah, B.A. Jaradat Fig. 10 Average execution time using DEM on 64-processor Hypercube and CDEM on 64-processor OTIS-Hypercube Fig. 11 Average execution time using DEM on 1024-processor Hypercube and CDEM on 1024-processor OTIS-Hypercube is the supremacy of the proposed CDEM methodology on an OTIS-Hypercube compared to DEM used for load balancing on a Hypercube. It is evident from Fig. 11 that less execution time is required to perform load balancing on a 1024-processor OTIS-Hypercube than that required by equivalent Hypercube. For instance, assuming the maximum load is 10 workload units per processor, a 1024-processor Hypercube executes its load balancing method in about 289 seconds. On the other hand, a 1024-processor OTIS-Hypercube requires around 43 sec-

18 The load balancing problem in OTIS-Hypercube interconnection 293 Fig. 12 Average accuracy using DEM on Hypercube and CDEM on OTIS-Hypercube onds to perform load balancing, which is a remarkable contribution to performance improvement. 5.2 Load balancing accuracy The error, that is responsible for determining the load balancing accuracy, is defined as the difference between the maximum number of workload units in any processor and the minimum number of workload units in any other processor. The conducted experiments were concerned with estimating the average error suffered by applying the Hypercube s DEM and the OTIS-Hypercube s CDEM. The average error results, from applying DEM and CDEM on Hypercube and OTIS- Hypercube, respectively, are depicted in Fig. 12. It is apparent that better accuracy can be achieved on an OTIS-Hypercube. For the set of experiments performed, the error was reduced on 16-, 64-, and 256-processor OTIS-Hypercube, compared to equivalent Hypercubes, while average error values have met for 1024-processor Hypercube and OTIS-Hypercube. This can be interpreted by the fact that in an OTIS-Hypercube a processor possessing the maximum load may be in a group distinct from that of a processor having the minimum load; this will decrease the probability of their meeting in any of the processors load balancing phases to execute load balancing. 5.3 Number of communication steps The average number of communication steps was computed among several runs of DEM on Hypercubes of 16, 64, 256, and 1,024 processors. The experimental work results of load balancing using CDEM in an OTIS-Hypercube revealed that the average number of communication steps is convergent to the number of communication steps required in Hypercubes of equivalent sizes. Figure 13 shows the average number of communication steps, required by DEM and CDEM to achieve load balancing on Hypercube and OTIS-Hypercube, respectively.

19 294 B.A. Mahafzah, B.A. Jaradat Fig. 13 Average no. of communication steps using DEM on Hypercube and CDEM on OTIS-Hypercube 5.4 Speed The speed at which the load balancing process occurs is a distinguished metric to evaluate since it reveals the role of the attractive technologies used in OTIS and its contribution to system s performance improvement and the processing time reduction. Several experiments were conducted to estimate the speeds at which the proposed CDEM occurs in 16-, 64-, 256-, and 1024-processor OTIS-Hypercube. These experiments were followed by an experimental work to evaluate the speeds at which DEM is performed on Hypercubes of equivalent sizes. The experiments were performed with the assumption that the speed of the electrical technology used is 250 Mb/s [18], and the speed of the optical interconnection technology used is 2.5 Gb/s [19]. Distinguished results have been obtained experimentally. A look at Fig. 14 indicates the speed achieved on OTIS-Hypercube, in comparison with Hypercube. The obtained results show that CDEM performs load balancing on OTIS-Hypercube at a rate five times faster than DEM on Hypercube. For instance, CDEM performs load balancing on 1024-processor OTIS-Hypercube at a speed of about 40 Gb/s, while the speed at which DEM performs load balancing on 1024-processor Hypercube is around 7.5 Gb/s. A careful examination of the obtained results revealed a match between analytical and experimental evaluation for both methods; CDEM and DEM, on OTIS- Hypercube and Hypercube, respectively. An example of such a comparison is shown in Fig. 15, which shows that the empirical speeds at which the load balancing process is accomplished using CDEM on various OTIS-Hypercubes sizes are very close to the speed analysis results.

20 The load balancing problem in OTIS-Hypercube interconnection 295 Fig. 14 Speed of load balancing using DEM on Hypercube and CDEM on OTIS-Hypercube Fig. 15 Analytical vs. experimental speed of CDEM on an OTIS-Hypercube 6 Conclusions and future work The proposed load balancing methodology, called Clusters Dimension Exchange Method (CDEM), on OTIS-Hypercube interconnection networks, has been introduced. The performed analysis and the conducted experiments have been presented, and compared to the analytical and experimental results obtained from applying the Dimension Exchange Method (DEM) on Hypercube interconnection networks. These results demonstrate the effectiveness of the proposed CDEM methodology and the superiority of OTIS-Hypercube, in terms of several performance metrics.

21 296 B.A. Mahafzah, B.A. Jaradat Reduced execution time was achieved, with higher accuracy at higher speed, by applying the proposed load balancing method, CDEM, on OTIS-Hypercube, in contrast with the time required to execute load balancing on Hypercube using DEM. It was clear from the empirical and the analytical results that the number of communication steps required by both, CDEM and DEM load balancing methodologies, are approximately the same. This research work is intended to be extended by applying the proposed CDEM on Extended OTIS-Hypercube interconnection networks, in which groups of processors are interconnected with wraparound links. In traditional OTIS-Hypercube interconnection networks, there is only one interconnection between every two groups, along which excess load can be transferred. However, the extra wraparound interconnection is expected to allow the excess load transfer along two interconnections. Thus, reducing the load transferred along each connection, and reducing the amount of transferred processor s local load since more than one processor will participate in the process of excess load transfer. Acknowledgements The authors would like to express their deep gratitude to the anonymous referees for their valuable comments and suggestions, which improved the paper. References 1. Marsden G, Marchand P, Harvey P, Esener S (1993) Optical transpose interconnection system architectures. Opt Lett 18(13): Najaf-Abadi H (2004) Performance modeling and analysis of OTIS networks. Master s thesis, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran 3. Wang C, Sahni S (1998) Basic operations on the OTIS-mesh optoelectronic computer. IEEE Trans Parallel Distrib Syst 9(12): Sahni S, Wang C (1997) BPC permutations on the OTIS-mesh optoelectronic computer. In: IEEE conference on massively parallel programming with optical interconnect (MPPOI 97) 5. Parhami B (2005) The Hamiltonicity of swapped (OTIS) networks built of Hamiltonian component networks. Inf Proc Lett 95: Wang C, Sahni S (2001) Matrix multiplication on the OTIS-mesh optoelectronic computer. IEEE Trans Comput 50(7): Wang C, Sahni S (1998) BPC permutations on the OTIS-Hypercube optoelectronic computer. Informatica 22: Rajasekeran S, Sahni S (1998) Randomized routing, selection, and sorting on the OTIS-mesh. IEEE Trans Parallel Distrib Syst 9(9): Zewail A (2002) Light and life. Ninth Rajiv Gandi Science and Technology Lecture, Bangalore, India 10. Krishnamoorthy A, Marchand P, Kiamilev F, Esener S (1992) Grain-size considerations for optoelectronic multistage interconnection networks. Appl Opt 31(26): Zhao C, Xiao W, Qin Y (2007) Hybrid diffusion schemes for load balancing on OTIS networks. In: ICA3PP, pp Qin Y, Xiao W, Zhao C (2007) GDED-X schemes for load balancing on heterogeneous OTIS networks. In: ICA3PP, pp Ranka S, Won Y, Sahni S (1998) Programming a hypercube multicomputer. IEEE Softw 5(5): Rim H, Jang J, Kim S (1999) An efficient dynamic load balancing using the dimension exchange method for balancing of quantized loads on hypercube multiprocessors. In: Proc of the second merged symposium (IPPS/SPDP 1999) 13th international parallel processing symposium and 10th symposium on parallel and distributed processing, pp Rim H, Jang J, Kim S (2003) A simple reduction of non-uniformity in dynamic load balancing of quantized loads on hypercube multiprocessors and hiding balancing overheads. J Comput Syst Sci 67:1 25

22 The load balancing problem in OTIS-Hypercube interconnection Jan G, Hwang Y (2003) An efficient algorithm for perfect load balancing on hypercube multiprocessors. J Supercomput 25: Willebeek-LeMair M, Reeves A (1993) Strategies for dynamic load balancing on highly parallel computers. IEEE Trans Parallel Distrib Syst 4(9): Kibar O, Marchand P, Esener S (1998) High speed CMOS switch designs for free-space optoelectronic MINs. IEEE Trans Very Large Scale Integr (VLSI) Syst 6(3): Esener S, Marchand P (2000) Present and future needs of free-space optical interconnects. In: 15 IPDPS 2000 workshop on parallel and distributed processing, pp Basel A. Mahafzah is an Assistant Professor of Computer Science at the University of Jordan, Jordan. He received his B.Sc. degree in Computer Science in 1991 from Mu tah University, Jordan. He also earned from the University of Alabama in Huntsville, USA, a B.S.E. degree in Computer Engineering. Moreover, he obtained his M.S. degree in Computer Science and Ph.D. degree in Computer Engineering from the same University, in 1994 and 1999, respectively. During his graduate studies he obtained a fellowship from the Jordan University of Science and Technology. After he obtained his Ph.D. and before joining the University of Jordan, he joined the Department of Computer Science at Jordan University of Science and Technology, where Dr. Mahafzah held several positions; Assistant Dean, Vice Dean, and Chief Information Officer at King Abdullah University Hospital. His research interests include Performance Evaluation, Parallel and Distributed Computing, Interconnection Networks, Artificial Intelligence, Data Mining, and e-learning. He received more than one million U.S. dollars in research and projects grants. Moreover, Dr. Mahafzah supervised Master students and developed several graduate and undergraduate programs in various fields of Information Technology. His experience in teaching extends to eight years. Bashira A. Jaradat is a Teaching and Research Assistant at the Computer Science Department of the Hashemite University, Jordan. She received her B.Sc. and M.S. degrees in Computer Science from Jordan University of Science and Technology, Jordan, in 2004 and 2007, respectively. Her research interests include Parallel and Distributed Computing, Artificial Intelligence, Spatial Data Mining, and Mobile Databases.

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

group 0 group 1 group 2 group 3 (1,0) (1,1) (0,0) (0,1) (1,2) (1,3) (3,0) (3,1) (3,2) (3,3) (2,2) (2,3)

group 0 group 1 group 2 group 3 (1,0) (1,1) (0,0) (0,1) (1,2) (1,3) (3,0) (3,1) (3,2) (3,3) (2,2) (2,3) BPC Permutations n The TIS-Hypercube ptoelectronic Computer Sartaj Sahni and Chih-fang Wang Department of Computer and Information Science and ngineering University of Florida Gainesville, FL 32611 fsahni,wangg@cise.u.edu

More information

Information Processing Letters

Information Processing Letters Information Processing Letters 110 (2010) 211 215 Contents lists available at ScienceDirect Information Processing Letters www.vier.com/locate/ipl Fully symmetric swapped networks based on bipartite cluster

More information

Embedding Large Complete Binary Trees in Hypercubes with Load Balancing

Embedding Large Complete Binary Trees in Hypercubes with Load Balancing JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 35, 104 109 (1996) ARTICLE NO. 0073 Embedding Large Complete Binary Trees in Hypercubes with Load Balancing KEMAL EFE Center for Advanced Computer Studies,

More information

Efficient Routing Algorithms on Optoelectronic Networks

Efficient Routing Algorithms on Optoelectronic Networks Proceedings of the 5th WSES International Conference on Telecommunications and Informatics Istanbul Turkey May 27-29 2006 (pp232-238) Efficient Routing lgorithms on Optoelectronic Networks J. l-sdi Computer

More information

The Chained-Cubic Tree Interconnection Network

The Chained-Cubic Tree Interconnection Network 334 The International Arab Journal of Information Technology, Vol. 8, No. 3, July 2011 The Chained-Cubic Tree Interconnection Network Malak Abdullah 1, Emad Abuelrub 2, and Basel Mahafzah 3 1 Department

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Analysis and Comparison of Torus Embedded Hypercube Scalable Interconnection Network for Parallel Architecture

Analysis and Comparison of Torus Embedded Hypercube Scalable Interconnection Network for Parallel Architecture 242 IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.1, January 2009 Analysis and Comparison of Torus Embedded Hypercube Scalable Interconnection Network for Parallel Architecture

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS Performance Analysis of Java NativeThread and NativePthread on Win32 Platform Bala Dhandayuthapani Veerasamy Research Scholar Manonmaniam Sundaranar University Tirunelveli, Tamilnadu, India dhanssoft@gmail.com

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

Prefix Computation and Sorting in Dual-Cube

Prefix Computation and Sorting in Dual-Cube Prefix Computation and Sorting in Dual-Cube Yamin Li and Shietung Peng Department of Computer Science Hosei University Tokyo - Japan {yamin, speng}@k.hosei.ac.jp Wanming Chu Department of Computer Hardware

More information

Topological Structure and Analysis of Interconnection Networks

Topological Structure and Analysis of Interconnection Networks Topological Structure and Analysis of Interconnection Networks Network Theory and Applications Volume 7 Managing Editors: Ding-Zhu Du, University of Minnesota, U.S.A. and Cauligi Raghavendra, University

More information

Line Segment Based Watershed Segmentation

Line Segment Based Watershed Segmentation Line Segment Based Watershed Segmentation Johan De Bock 1 and Wilfried Philips Dep. TELIN/TW07, Ghent University Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium jdebock@telin.ugent.be Abstract. In this

More information

Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming

Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Fabiana Leibovich, Laura De Giusti, and Marcelo Naiouf Instituto de Investigación en Informática LIDI (III-LIDI),

More information

Models and Algorithms for Optical and Optoelectronic Parallel Computers

Models and Algorithms for Optical and Optoelectronic Parallel Computers Models and Algorithms for Optical and Optoelectronic Parallel Computers artaj ahni Dept. of Computer and Information cience and ngineering University of Florida Gainesville, FL 32611, UA sahni@cise.ufl.edu

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

PAPER Node-Disjoint Paths Algorithm in a Transposition Graph

PAPER Node-Disjoint Paths Algorithm in a Transposition Graph 2600 IEICE TRANS. INF. & SYST., VOL.E89 D, NO.10 OCTOBER 2006 PAPER Node-Disjoint Paths Algorithm in a Transposition Graph Yasuto SUZUKI, Nonmember, Keiichi KANEKO a), and Mario NAKAMORI, Members SUMMARY

More information

Load Balancing in the Macro Pipeline Multiprocessor System using Processing Elements Stealing Technique. Olakanmi O. Oladayo

Load Balancing in the Macro Pipeline Multiprocessor System using Processing Elements Stealing Technique. Olakanmi O. Oladayo Load Balancing in the Macro Pipeline Multiprocessor System using Processing Elements Stealing Technique Olakanmi O. Oladayo Electrical & Electronic Engineering University of Ibadan, Ibadan Nigeria. Olarad4u@yahoo.com

More information

A New Platform NIDS Based On WEMA

A New Platform NIDS Based On WEMA I.J. Information Technology and Computer Science, 2015, 06, 52-58 Published Online May 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2015.06.07 A New Platform NIDS Based On WEMA Adnan A.

More information

Online algorithms for clustering problems

Online algorithms for clustering problems University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh

More information

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

Algorithms and Applications

Algorithms and Applications Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers

More information

Joe Wingbermuehle, (A paper written under the guidance of Prof. Raj Jain)

Joe Wingbermuehle, (A paper written under the guidance of Prof. Raj Jain) 1 of 11 5/4/2011 4:49 PM Joe Wingbermuehle, wingbej@wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download The Auto-Pipe system allows one to evaluate various resource mappings and topologies

More information

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 4 slides of the text, by A. Grama w/ a few changes, augmentations and corrections Topic Overview

More information

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

Performance of Multicore LUP Decomposition

Performance of Multicore LUP Decomposition Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER Hardware Sizing Using Amazon EC2 A QlikView Scalability Center Technical White Paper June 2013 qlikview.com Table of Contents Executive Summary 3 A Challenge

More information

Parallel Approach for Implementing Data Mining Algorithms

Parallel Approach for Implementing Data Mining Algorithms TITLE OF THE THESIS Parallel Approach for Implementing Data Mining Algorithms A RESEARCH PROPOSAL SUBMITTED TO THE SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

More information

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued) Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn

More information

COMP Data Structures

COMP Data Structures COMP 2140 - Data Structures Shahin Kamali Topic 1 - Introductions University of Manitoba Based on notes by S. Durocher. COMP 2140 - Data Structures 1 / 35 Introduction COMP 2140 - Data Structures 1 / 35

More information

Data Communication and Parallel Computing on Twisted Hypercubes

Data Communication and Parallel Computing on Twisted Hypercubes Data Communication and Parallel Computing on Twisted Hypercubes E. Abuelrub, Department of Computer Science, Zarqa Private University, Jordan Abstract- Massively parallel distributed-memory architectures

More information

An Algorithm for k-pairwise Cluster-fault-tolerant Disjoint Paths in a Burnt Pancake Graph

An Algorithm for k-pairwise Cluster-fault-tolerant Disjoint Paths in a Burnt Pancake Graph 2015 International Conference on Computational Science and Computational Intelligence An Algorithm for k-pairwise Cluster-fault-tolerant Disjoint Paths in a Burnt Pancake Graph Masato Tokuda, Yuki Hirai,

More information

One-Point Geometric Crossover

One-Point Geometric Crossover One-Point Geometric Crossover Alberto Moraglio School of Computing and Center for Reasoning, University of Kent, Canterbury, UK A.Moraglio@kent.ac.uk Abstract. Uniform crossover for binary strings has

More information

Swapped interconnection networks: Topological, performance, and robustness attributes

Swapped interconnection networks: Topological, performance, and robustness attributes J Parallel Distrib Comput () wwwelseviercom/locate/jpdc Swapped interconnection networks: Topological, performance, and robustness attributes Behrooz Parhami Department of Electrical and Computer Engineering,

More information

COMP Data Structures

COMP Data Structures Shahin Kamali Topic 1 - Introductions University of Manitoba Based on notes by S. Durocher. 1 / 35 Introduction Introduction 1 / 35 Introduction In a Glance... Data structures are building blocks for designing

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Improving Memory Repair by Selective Row Partitioning

Improving Memory Repair by Selective Row Partitioning 200 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems Improving Memory Repair by Selective Row Partitioning Muhammad Tauseef Rab, Asad Amin Bawa, and Nur A. Touba Computer

More information

Some Properties of Swapped Interconnection Networks

Some Properties of Swapped Interconnection Networks Some Properties of Swapped Interconnection Networks Behrooz Parhami Dept. of Electrical & Computer Engineering University of California Santa Barbara, CA 93106-9560, USA parhami@ece.ucsb.edu Abstract Interconnection

More information

Node-to-Set Disjoint Paths Problem in Pancake Graphs

Node-to-Set Disjoint Paths Problem in Pancake Graphs 1628 IEICE TRANS. INF. & SYST., VOL.E86 D, NO.9 SEPTEMBER 2003 PAPER Special Issue on Parallel and Distributed Computing, Applications and Technologies Node-to-Set Disjoint Paths Problem in Pancake Graphs

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

High-performance message striping over reliable transport protocols

High-performance message striping over reliable transport protocols J Supercomput (2006) 38:261 278 DOI 10.1007/s11227-006-8443-6 High-performance message striping over reliable transport protocols Nader Mohamed Jameela Al-Jaroodi Hong Jiang David Swanson C Science + Business

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

A COMPARATIVE STUDY IN DYNAMIC JOB SCHEDULING APPROACHES IN GRID COMPUTING ENVIRONMENT

A COMPARATIVE STUDY IN DYNAMIC JOB SCHEDULING APPROACHES IN GRID COMPUTING ENVIRONMENT A COMPARATIVE STUDY IN DYNAMIC JOB SCHEDULING APPROACHES IN GRID COMPUTING ENVIRONMENT Amr Rekaby 1 and Mohamed Abo Rizka 2 1 Egyptian Research and Scientific Innovation Lab (ERSIL), Egypt 2 Arab Academy

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Slim Fly: A Cost Effective Low-Diameter Network Topology

Slim Fly: A Cost Effective Low-Diameter Network Topology TORSTEN HOEFLER, MACIEJ BESTA Slim Fly: A Cost Effective Low-Diameter Network Topology Images belong to their creator! NETWORKS, LIMITS, AND DESIGN SPACE Networks cost 25-30% of a large supercomputer Hard

More information

A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers

A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 3343 Abstract The

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi Introduction and Motivation 2 A serious issue to the effective utilization

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

Information Cloaking Technique with Tree Based Similarity

Information Cloaking Technique with Tree Based Similarity Information Cloaking Technique with Tree Based Similarity C.Bharathipriya [1], K.Lakshminarayanan [2] 1 Final Year, Computer Science and Engineering, Mailam Engineering College, 2 Assistant Professor,

More information

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃLPHÃIRUÃDÃSDFHLPH $GDSWLYHÃURFHVVLQJÃ$OJRULWKPÃRQÃDÃDUDOOHOÃ(PEHGGHG \VWHP Jack M. West and John K. Antonio Department of Computer Science, P.O. Box, Texas Tech University,

More information

CHAPTER 5 ANT-FUZZY META HEURISTIC GENETIC SENSOR NETWORK SYSTEM FOR MULTI - SINK AGGREGATED DATA TRANSMISSION

CHAPTER 5 ANT-FUZZY META HEURISTIC GENETIC SENSOR NETWORK SYSTEM FOR MULTI - SINK AGGREGATED DATA TRANSMISSION CHAPTER 5 ANT-FUZZY META HEURISTIC GENETIC SENSOR NETWORK SYSTEM FOR MULTI - SINK AGGREGATED DATA TRANSMISSION 5.1 INTRODUCTION Generally, deployment of Wireless Sensor Network (WSN) is based on a many

More information

Design of memory efficient FIFO-based merge sorter

Design of memory efficient FIFO-based merge sorter LETTER IEICE Electronics Express, Vol.15, No.5, 1 11 Design of memory efficient FIFO-based merge sorter Youngil Kim a), Seungdo Choi, and Yong Ho Song Department of Electronics and Computer Engineering,

More information

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

Fast Fuzzy Clustering of Infrared Images. 2. brfcm Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.

More information

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348 Chapter 1 Introduction: Part I Jens Saak Scientific Computing II 7/348 Why Parallel Computing? 1. Problem size exceeds desktop capabilities. Jens Saak Scientific Computing II 8/348 Why Parallel Computing?

More information

Algorithm Engineering with PRAM Algorithms

Algorithm Engineering with PRAM Algorithms Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and

More information

7 Solutions. Solution 7.1. Solution 7.2

7 Solutions. Solution 7.1. Solution 7.2 7 Solutions Solution 7.1 There is no single right answer for this question. The purpose is to get students to think about parallelism present in their daily lives. The answer should have at least 10 activities

More information

A New Parallel Matrix Multiplication Algorithm on Tree-Hypercube Network using IMAN1 Supercomputer

A New Parallel Matrix Multiplication Algorithm on Tree-Hypercube Network using IMAN1 Supercomputer A New Parallel Matrix Multiplication Algorithm on Tree-Hypercube Network using IMAN1 Supercomputer Orieb AbuAlghanam, Mohammad Qatawneh Computer Science Department University of Jordan Hussein A. al Ofeishat

More information

Parallel Implementation of a Random Search Procedure: An Experimental Study

Parallel Implementation of a Random Search Procedure: An Experimental Study Parallel Implementation of a Random Search Procedure: An Experimental Study NIKOLAI K. KRIVULIN Faculty of Mathematics and Mechanics St. Petersburg State University 28 University Ave., St. Petersburg,

More information

High-Speed Cell-Level Path Allocation in a Three-Stage ATM Switch.

High-Speed Cell-Level Path Allocation in a Three-Stage ATM Switch. High-Speed Cell-Level Path Allocation in a Three-Stage ATM Switch. Martin Collier School of Electronic Engineering, Dublin City University, Glasnevin, Dublin 9, Ireland. email address: collierm@eeng.dcu.ie

More information

The Postal Network: A Versatile Interconnection Topology

The Postal Network: A Versatile Interconnection Topology The Postal Network: A Versatile Interconnection Topology Jie Wu Yuanyuan Yang Dept. of Computer Sci. and Eng. Dept. of Computer Science Florida Atlantic University University of Vermont Boca Raton, FL

More information

Early Measurements of a Cluster-based Architecture for P2P Systems

Early Measurements of a Cluster-based Architecture for P2P Systems Early Measurements of a Cluster-based Architecture for P2P Systems Balachander Krishnamurthy, Jia Wang, Yinglian Xie I. INTRODUCTION Peer-to-peer applications such as Napster [4], Freenet [1], and Gnutella

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 6, JUNE 1998 535 Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms Rajendra V. Boppana, Member, IEEE, Suresh

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

The University of Jordan. Accreditation & Quality Assurance Center. Curriculum for Doctorate Degree

The University of Jordan. Accreditation & Quality Assurance Center. Curriculum for Doctorate Degree Accreditation & Quality Assurance Center Curriculum for Doctorate Degree 1. Faculty King Abdullah II School for Information Technology 2. Department Computer Science الدكتوراة في علم الحاسوب (Arabic).3

More information

Engineering shortest-path algorithms for dynamic networks

Engineering shortest-path algorithms for dynamic networks Engineering shortest-path algorithms for dynamic networks Mattia D Emidio and Daniele Frigioni Department of Information Engineering, Computer Science and Mathematics, University of L Aquila, Via Gronchi

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes 1 U.Rahila Begum, 2 V. Padmajothi 1 PG Student, 2 Assistant Professor 1 Department Of

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

BİL 542 Parallel Computing

BİL 542 Parallel Computing BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,

More information

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs Harrys Sidiropoulos, Kostas Siozios and Dimitrios Soudris School of Electrical & Computer Engineering National

More information

Integrating MRPSOC with multigrain parallelism for improvement of performance

Integrating MRPSOC with multigrain parallelism for improvement of performance Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,

More information

A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and

A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and Parallel Processing Letters c World Scientific Publishing Company A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS DANNY KRIZANC Department of Computer Science, University of Rochester

More information

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments 1 A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments E. M. Karanikolaou and M. P. Bekakos Laboratory of Digital Systems, Department of Electrical and Computer Engineering,

More information

LED holographic imaging by spatial-domain diffraction computation of. textured models

LED holographic imaging by spatial-domain diffraction computation of. textured models LED holographic imaging by spatial-domain diffraction computation of textured models Ding-Chen Chen, Xiao-Ning Pang, Yi-Cong Ding, Yi-Gui Chen, and Jian-Wen Dong* School of Physics and Engineering, and

More information

On Veracious Search In Unsystematic Networks

On Veracious Search In Unsystematic Networks On Veracious Search In Unsystematic Networks K.Thushara #1, P.Venkata Narayana#2 #1 Student Of M.Tech(S.E) And Department Of Computer Science And Engineering, # 2 Department Of Computer Science And Engineering,

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

184 J. Comput. Sci. & Technol., Mar. 2004, Vol.19, No.2 On the other hand, however, the probability of the above situations is very small: it should b

184 J. Comput. Sci. & Technol., Mar. 2004, Vol.19, No.2 On the other hand, however, the probability of the above situations is very small: it should b Mar. 2004, Vol.19, No.2, pp.183 190 J. Comput. Sci. & Technol. On Fault Tolerance of 3-Dimensional Mesh Networks Gao-Cai Wang 1;2, Jian-Er Chen 1;3, and Guo-Jun Wang 1 1 College of Information Science

More information

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Automatic Scaling Iterative Computations. Aug. 7 th, 2012 Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics

More information

Evaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor

Evaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor Evaluation of Power Consumption of Modified Bubble, Quick and, Algorithm on the Dual Processor Ahmed M. Aliyu *1 Dr. P. B. Zirra *2 1 Post Graduate Student *1,2, Computer Science Department, Adamawa State

More information

An In-place Algorithm for Irregular All-to-All Communication with Limited Memory

An In-place Algorithm for Irregular All-to-All Communication with Limited Memory An In-place Algorithm for Irregular All-to-All Communication with Limited Memory Michael Hofmann and Gudula Rünger Department of Computer Science Chemnitz University of Technology, Germany {mhofma,ruenger}@cs.tu-chemnitz.de

More information

PARTICLE Swarm Optimization (PSO), an algorithm by

PARTICLE Swarm Optimization (PSO), an algorithm by , March 12-14, 2014, Hong Kong Cluster-based Particle Swarm Algorithm for Solving the Mastermind Problem Dan Partynski Abstract In this paper we present a metaheuristic algorithm that is inspired by Particle

More information

Chapter 6 Solutions S-3

Chapter 6 Solutions S-3 6 Solutions Chapter 6 Solutions S-3 6.1 There is no single right answer for this question. The purpose is to get students to think about parallelism present in their daily lives. The answer should have

More information

ADAPTIVE SORTING WITH AVL TREES

ADAPTIVE SORTING WITH AVL TREES ADAPTIVE SORTING WITH AVL TREES Amr Elmasry Computer Science Department Alexandria University Alexandria, Egypt elmasry@alexeng.edu.eg Abstract A new adaptive sorting algorithm is introduced. The new implementation

More information