Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing

Size: px

Start display at page:

Download "Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing"

Thomasine Burns
6 years ago
Views:

1 Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing 1 Selvarajah Thuseethan, 2 Shanmuganathan Vasanthapriyan 1,2 Department of Computing and Information Systems, Sabaragamuwa University of Sri Lanka, Belihuloya, Sri Lanka 1 thuseethan@gmail.com, 2 svpriyan@gmail.com Abstract: This paper is mainly concerned with the static interconnection network, its topological properties and metrics, particularly for exiting topologies and proposed one. The interconnection network topology is a key factor in determining the characteristics of parallel computers; suitable topology provides efficiency increment while performing tasks. In the recent years, there are numerous topologies available with various characteristics need to be improved. In this research we analyzed existing static interconnection topologies and developed a novel topology by minimizing some degradation factors of topological properties. A novel topology, Spider-web topology is proposed and shows a considerable advantage over the existing topologies. Further, one of the major aims of this work is to do a comparative study of the existing static interconnection networks with this novel topology by analyzing the properties and metrics. Both theoretical-based and experimental-based comparison conducted here shows that the proposed topology is able to perform better than the existing topologies. Keywords Interconnection Network; Topology; Parallel Computing. I. INTRODUCTION There are many crucial factors that affect the performance of a parallel system and processor architecture is one of those. High-performance processor architectures are moving towards the designs that feature a single chip with multiple processing cores [Rakesh Kumar et al., 2005]. Since the device characteristics reaching their physical limits, parallel or distributed fashion has been widely known as a promising approach for building high performance computing systems to do huge tasks. There are three important components such as multiple processing elements, I/O modules, and memory modules exist in multi-processor systems. Each and every memory module and I/O unit exist in a parallel architecture can be access by any processor with the help of well-set interconnection networks. In this sense the interconnection network is the heart of parallel architecture [Tse-Yun Feng and Chuan-Lin, 1984]. Ultimately the interconnection network is responsible for fast and reliable communication among the processing elements in any parallel or distributed computer. Thus, an interconnection network is essential for exchanging data between processing elements within a network of nodes. In this sense considering a concurrent computer the most critical component is its communication network. To exploit the efficient and reliable parallelism, the system must be designed to considerably reduce the communication overhead between the processing elements. To achieve this interconnection network must be reliable and efficient, at the same time should be cost effective. Interconnection networks are recognized as communication subnets or communication subsystems. The performance of multi-processor systems severely relies on speed and efficiency of interconnection network. Thus, it depends on the applicable data exchanges in between the processors. The multiprocessor system may have single global shared memory as well as each processor has its own local memory. Thus, the overall performance of the multi-processor systems depends on interconnection networks. Further the physical representation of the multiprocessor organization is depending on the interconnection network used in it. Different types of interconnection networks have different hardware features and shows different system performances. In this section, we will look into these differences quantitatively. In particular, we will compare the hardware cost and system performance of three interconnection networks. There are two broad topology based interconnection network categories available, namely static and dynamic interconnection networks. Normally static networks establish all connections when the system is designed rather than when the connection is required by specific program. The messages or data must be routed along established edges. Even though we are in the age of dynamic topologies still static topologies are interesting to discuss because of its easiness and convenience. More than this static topologies are efficient in some specific occasions and still in use. As evidence, static topology is suitable for problems with regular communication patterns and can be predicted reasonably well. Another explicit example is the problems in which data exchanges occur mostly between neighbor processing elements. Here, we propose one such static interconnection network. In this paper, we model and implement spider-web interconnection topology and compare with other existing static topologies. This study has two primary goals: (1) to show the proposed topology is theoretically efficient based on primary properties and (2) to show experimentally efficient than existing topologies. The primary properties of interconnection networks compared theoretically and novel topology shows considerable advancement over existing topology. Further, novel topology experimentally evaluated with existing topologies. In experimental evaluation, we apply the proposed topology for the problem of sorting the numbers with various sizes of processing elements to find the throughput. 33

II. LITERATURE REVIEW The classification of most large-scale parallel processing computers in two general categories based on the number of concurrent active instruction streams within the

2 II. LITERATURE REVIEW The classification of most large-scale parallel processing computers in two general categories based on the number of concurrent active instruction streams within the computational engine or problem. Parallel processing systems that deals or execute with a single thread of control are named as Single Instruction Multiple Data (SIMD) machine. The Multiple Instruction Multiple Data stream (MIMD) machines have the capability of executing many separate threads of control at a time. Early SIMD machines required the simultaneous transfer of data from each network input to each output for a relatively small set of communication configurations or permutations; whereas the SIMD and MIMD machines of current days need to support varied patterns of synchronous and asynchronous traffic, respectively [Siegel and Craig, 1996]. Both types of message passing archived by interconnection networks. Interconnection networks are built up of switching elements (switches), which are devices that contain multiple input and output ports with a crossbar interconnection between them [Siegel and Craig, 1996]. The interconnection network has been positioned between various devices in the multi-processor network. Processing units are responsible of data processing and interconnection network is responsible of data transfer between processing units and memory banks [Ananth et al., 2003]. Since the interconnection network is an essential part of any parallel computer the ideal parallel system can be developed only if fast and reliable communication exists over the network. Various interconnection networks practiced in past and each of which consists their own advantages and disadvantages. The interconnection networks are like usual network systems consisting of nodes and links (edges). The interconnection network is placed between various devices in the multiprocessor network. A. Interconnection Networks Taxonomy There are many different interconnection networks have been proposed in the past to solve the problem of providing efficient and fast communication at a reasonable cost. Even though, there is no single network is generally considered as ultimate until now. Since the cost-effectiveness of an interconnection network design varies based on the computational tasks for which it will be used and amount of data deals with. Interconnection networks stipulate particularly high demands in terms of bandwidth, delay, and delivery over short distances. An interconnection network could be either static or dynamic based on the topology. While static network contains fixed edges, dynamic network re-establish the required connection on the fly as it needed. Based on the interconnection pattern static networks are classified as onedimension (1-D), two-dimension (2-D) and hypercube (HC). Further dynamic networks are classified based on their interconnection scheme as bus-based and switch-based. Busbased networks divided into two broad categories as single bus and multiple buses. As like this switch-based dynamic networks also classified as single-stage, multi-stage and crossbar. This classification is according to the structure of the interconnection network. Figure 1 illustrates this taxonomy. Figure 1: Taxonomy of Interconnection Networks. There are two different static networks can be identified based on the connectivity, completely connected networks (CCNs) and limited connection networks (LCNs). Since we propose a novel static interconnection network we concentrate on these two types of interconnection networks. In a well-connected topology or completely connected network each processing element is connected to all other processing element in the network. Since every node is connected with each other, routing of messages between nodes becomes a straightforward task. Therefore, it guarantees fast delivery of messages from any source processing element to any destination processing element. It is because one and only edge has to be traversed in passing messages between nodes. In limited connection networks (LCNs) there is no direct edge from every node to every other node in the network. Here, communication in between two nodes may have to be routed through some other external nodes in network. Since messages routed through nodes, the length of the path between nodes measured in terms of the number of edges that have to be traversed. A node is normally not directly connected to all other nodes in the parallel computer; message transfer from a source to a destination node may require several steps through intermediate nodes to reach its destination node [Leighton, 1992]. There are two requirements imposed in limited interconnection networks to have interconnectivity, (1) the existence for a pattern among the connected nodes and (2) the mechanism or procedure for routing messages between nodes. Several limited connection networks available such as linear array, ring networks, two-dimensional arrays, tree networks and cube networks. B. Network Properties Interconnection networks stipulate particularly high demands in terms of bandwidth, delay, and delivery over short distances [Lysne et al., 2008]. These depend on some significant properties. Several properties are associated with interconnection networks. 1) Topology One major characteristic of a network is its topology. The network topology defined as the abstract representation of the connections in the network [Feng, 1981]. It indicates how the nodes in a network are organized. Network topology refers to the layouts of edges and processing elements that establish interconnections. 2) Network Diameter 34

The minimum distance between the farthest nodes in a network considered as network diameter. The diameter is measured in terms of number of distinct hops between source and destination nodes.

3 The minimum distance between the farthest nodes in a network considered as network diameter. The diameter is measured in terms of number of distinct hops between source and destination nodes. 3) Node degree The number of edges connected with a node is called node degree. In unidirectional interconnection network, if the edge carries message from the node then it is called as out degree and carries data into the node is called in degree. 4) Bisection Bandwidth The minimum number of edges required to be cut to split a network into two halves is called as bisection bandwidth. 5) Latency It is a time factor which indicates the delay in transferring the message between source and destination. 6) Connectivity The minimum number of arcs that must be removed to break it into two disconnected networks referred to as connectivity. The larger value is efficient one. 7) Cost The number of edges employed in the network has become the cost of the network. Here, the smaller value for cost is efficient. Among all seven important properties we consider diameter, cost, and connectivity since these three hugely affect the performance of any topology. III. DESCRIPTION OF TOPOLOGY An undirected graph is often adopted to model an interconnection network, in which vertices correspond to the processing elements and edges correspond to the bidirectional edges [Keqiu Li et al., 2013]. The Spider-web topology, a static topology proposed here is adapted an undirected graph. It contains bi-directional edges between processing element. Figure 2: Spider topology with three levels. A. Structural Description Figure 2 shows the Spider-web topology with three distinct levels starting from 0 to 2. The nodes are labeled in numbers as 1, 2, 3, and so on. In this topology nodes are arranged in different levels starting from 0 to n. Level 0 processing element is indicated in black colour, level 1 processing elements are in green colour and level 2 processing elements are in blue colour. The interconnecting edges are indicated by black colour lines, show the connection between processing elements. Each node in level L-1 is connected with exactly five adjacent nodes and the nodes in level L all nodes have edges with three adjacent nodes, where L starts from 1 to n. Nodes in level 0 and level 1 are arranged by get connected in triangulation. The nodes in level greater than 1 have interconnection in triangular and rectangular manner. B. Characteristics Number of nodes and edges with respect to various levels are the basic characteristics in an interconnection topology. The nature of increment in nodes and edges against increment of level is given in Table 1. TABLE 1: NODES AND EDGES IN EACH LEVELS. C. Message Passing Parallel programming environments offer the user a convenient way to express parallel computation and communication [Bruck et al., 1995]. When executing a parallel program on a multi computer system, the processing elements will have to exchange information, a process which we call routing [Kotsis, 1992]. According to the number of partners there are three different types of routing techniques exist [Valiant\& Brebner, 1981]. Point-to-point routing; where, one node wants to send message to another neighbour node. Broadcasting; where, one node (originator) distributes message to all neighbours. Gossiping; where, each node sends message to all others while receiving the messages from others simultaneously. Spider-web topology adapts to all three routing techniques in different circumstances. The selection depends on the problem and need of distribution of messages among processing elements. IV. COMPARISON AND EVALUATION In this section, we have done a comparative study between proposed topology and other existing 1 - dimensional topologies both theoretically and experimentally. In this sense, theoretical comparison explicitly shows the advancements of the Spider-web topology in terms of structural abstraction and 35

basic network properties. On the other hand experimental works verify the efficiency increment of the proposed topology under evaluation or execution state of specific practical problem. A.

4 basic network properties. On the other hand experimental works verify the efficiency increment of the proposed topology under evaluation or execution state of specific practical problem. A. Theoritical Comparision The major theoretical comparison has been done by analyzing three major metrics of the interconnection network topologies. Here in Spider-web topology n indicates the level and in other topologies p indicates number of processing elements. Table 2 shows the comparison of existing topologies with proposed topology. TABLE 2: COMPARISON WITH OTHER STATIC TOPOLOGIES. be allocated for broadcasting the messages. All other yellow coloured edges used to have auxiliary point-to-point message passing when required. These point-to-point message passing occurs only when one node vanish out its task where other two adjacent nodes in the same level sill has tasks to complete. According to the results we had, to sort less number of elements the proposed topology didn t show noticeable efficiency improvements over existing topologies. Any way it employed at least a small amount of improvement in efficiency. On the other hand it shows considerable improvement in handling huge amount of data to process. Efficiency improvement increased further with the number of processors used in topology. It shows that efficiency is getting better with increments in levels of proposed topology. Our study is based on the most commonly used criteria for evaluating interconnection networks. Selected matrices diameter, connectivity and cost have high influence in defining efficiency of interconnection networks. Therefore, the theoretical study has become more valid in terms of efficiency. In terms of diameter spider topology has smaller value than others, especially when the number of nodes increases diameter of spider topology will become very small. Therefore, it is efficient than any other compared static topology. The number of edges that are required by a given network is an important factor that affects its implementation cost [Abuelrub, 2008]. While considering connectivity it has larger value than linear, ring and star, so that spider topology is efficient than these three but not with mesh topology. In case of having large number of nodes, cost of the spider topology is very much less than other topology. Therefore, that it is another advantage over the other methods. B. Experimental Work To experiment the efficiency evaluation, sorting numbers has been used as the problem. We have done this as two different experiments (sorting) with two different scales of elements involved. First one contains less numbers of sorting elements which contain 100 numbers to sort. Other one contains large number of sorting elements which contains 100 thousands random elements. But in both cases we used up to 30 processors due to the limitations in setting up the lab. Based on our programming in this problem routing consist of all three existing routing techniques [Valiant & Brebner, 1981]. Figure 3 indicates the abstraction of routing setup contracted in Spider-web topology for this particular problem. Red coloured edges indicate the first level broadcasting where node 1 distributes the task to its immediate neighbours. Since there are 5 immediate neighbours exist for node 1, the task has been divided into equal sized 5 sub tasks. To do this we used divide and conquer technique. Thereafter these 5 nodes distribute the task to their next level immediate neighbours. During this distribution blue coloured edges has to Figure 3: Spider-web topology: Routing Further, experimental results shows the proposed topology is three times better than other compared topologies in performing tasks. Figure 4 shows the experimental comparison of topologies. Here x-axis indicates the number of processors and y-axis indicates execution time in milliseconds. Figure 4: Spider-web topology with three levels. V. CONCLUSION In this paper, we proposed a new interconnection network, the Spider-web topology, and showed considerable advantage over others and tasks can be done efficiently on it. The proposed Spider-web topology has tremendous potential to be used as an interconnection network for very huge scale 36

5 parallel computers since the Spider-web topology can connect hundreds of millions nodes with up to 5 edges per node and it keeps some desired properties of all other existing topologies that are useful for efficient communication among the processing elements. Since much of the community has stimulated to rely on to lower-dimensional topologies such as meshes and tori, the proposed one could be a useful one. As a result, the comparative theoretical based study shows that the Spider-web topology is efficient than other topologies in terms of diameter, connectivity and cost. Furthermore, it shows significant efficiency in the experiment of sorting numbers. Proposed topology is far most efficient than others to do huge tasks like sorting lengthy number series. Some of the issues concerning about Spider-web topology are (1) develop an ultimate fault tolerant routing algorithms for Spider-web topology with faulty nodes, (2) embed other frequently use topologies with Spider-web topology. ACKNOWLEDGEMENT We wish to thank all supporters for their comments and suggestions. Further we extend our thanks to Sabaragamuwa University of Sri Lanka society for given us the confidence to do this research. REFERENCES [1] Abuelrub, E. (2008). A Comparative Study on the Topological Properties of the Hyper-Mesh Interconnection Network. Proceedings of the World Congress on Engineering. [2] Bruck, J., Dolev, D., Ho, C. T., Rosu, M. C., & Strong, R. (1995, July) Efficient message passing interface (MPI) for parallel computing on clusters of workstations. In Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures (pp ). ACM. [3] Feng, T. Y. (1981). A survey of interconnection networks. Computer, 14(12), [4] Grama Ananth, Gupta Anshul, Karypis George, Kumar Vipin, Introduction to Parallel Computing, Second edition. Addison-Wesley, 2003, ISBN [5] Kotsis, G.(1992). Interconnection Topologies and Routing for Parallel Processing Systems. Technical Report Series, ACPC. [6] Kumar, R., Zyuban, V., \& Tullsen, D. M. (2005, June). Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. InComputer Architecture, ISCA'05. Proceedings. 32nd International Symposium on (pp ). IEEE. [7] Leighton, F. T. (1992). Introduction to parallel algorithms and architectures (pp ). San Francisco: Morgan Kaufmann. [8] Li, K., Mu, Y., Li, K., & Min, G. (2013). Exchanged crossed cube: a novel interconnection network for parallel computation. Parallel and Distributed Systems, IEEE Transactions on, 24(11), [9] Lysne, O., Reinemo, S. A., Skeie, T., Solheim, Å. G., Sødring, T., Huse, L. P., & Johnsen, B. D. (2008). Interconnection Networks: Architectural Challenges for Utility Computing Data Centers. IEEE Computer, 41(9), [10] Valiant, L. G., & Brebner, G. J. (1981, May). Universal schemes for parallel communication. In Proceedings of the thirteenth annual ACM symposium on Theory of computing (pp ). ACM. [11] Wu, C. L., & Feng, T. Y. (1984). Interconnection networks for parallel and distributed processing. IEEE Computer Society Press. 37

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection