Latest Trends in Applied Informatics and Computing
|
|
- Edmund Martin
- 5 years ago
- Views:
Transcription
1 Parallel Simulation and Communication Performance Evaluation of a Multistage BBN Butterfly Interconnection Network for High- Performance Computer Clusters PLAMENKA BOROVSKA, DESISLAVA IVANOVA, PAVEL TSVETANSKI Computer Systems Department Technical University of Sofia 8 Kliment Ohridski Boul., 1000 Sofia BULGARIA pborovska@tu-sofia.bg, d_ivanova@tu-sofia.bg, pavel_tsvetanski@tu-sofia.bg Abstract: The communication performance of multistage interconnection networks is a crucial factor influencing the parallel performance of high-performance computer clusters. In this paper we have proposed a methodology for parallelization of an OMNET++ sequential model. We designed in parallel manner a multistage BBN interconnect network topology to meet the demands of efficient and fast communication on high-performance computer systems. The parallel communication are evaluated on the basis of parallel simulation models using the simulation framework OMNET++ (MPI) that is run on IBM HS22 Blade Center at the High-Performance and GRID Computing Laboratory located at Computer Systems Department, Technical University of Sofia. Result analysis of parallel simulation results has been performed. Key-Words: High-Speed Interconnection Networks, BBN Network Architecture, OMNET++, Null Message Protocol, Parallel Simulations, Communication Performance Evaluation, Performance Analysis 1 Introduction Interconnection network architecture designs are influenced by next generation high-performance computer clusters and supercomputer technology. The path to next generation Tier-0 computer systems is increasingly dependent on designing computer clusters with hundreds and thousands of processors. The interconnection topology design of a parallel computer system is a critical factor in determining the computer performance. [1-4] Interconnection network designs vary with respect to communication parameters: throughput and latency and cost. Communication network performance determines computer cluster performance for many applications. Therefore, the choice of network architecture has a significant impact on computer performance and will affect the usability of a parallel computer cluster. Interconnection networks are composed of a set of shared switch nodes and links, and the network topology refers to the arrangement of these nodes and links. Selecting the network topology is the first and very important step in designing a network because the flow-control and routing algorithm depend heavily on the network topology design. The goal of this paper is to propose a methodology for parallelization of an OMNET++ sequential model and to evaluate the communication performance of a multistage BBN network design on the basis of program implementation on IBM HS22 Blade Center, located at the High- Performance and GRID Computing Laboratory, Technical University of Sofia. Communication performance of a BBN multistage topology is performed by means of network simulations using OMNET++. 2 OMNET++ Platform and Parallel Simulations OMNeT ++ is essentially a set of software tools and libraries that supports the development of simulation models. Most often OMNeT++ is used to develop models of computer networks and protocols. OMNeT++ represents a simulation environment, including specific libraries (simulation framework and library). It is built up of individual components called modules. Its main purpose is to be used for building network simulations of ad-hoc networks, wireless networks, communication networks and others. OMNeT++ includes Eclipse-based graphical development environment (IDE) and some ISBN:
2 additional tools to facilitate the work of the developers. [5] OMNeT++ also provides support for parallel simulation execution. Very large simulations may benefit from the parallel distributed simulation (PDES) feature, either by getting speedup, or by distributing memory requirements. [8] 2.1 Null Message Protocol OMNeT++ provides a Null Message protocol, which implements the Null Message conservative synchronization algorithm in a class called cnullmessageprotocol. The implementation of Null Message Protocol in OMNeT++ is based on the terminology defined in [8, 9]. Let LPp be the logical processes that a given parallel simulation model is composed of, where p is in the range [0, count of logical processes 1]. Let r be a moment in the physical time of a given simulation execution. Taking LPp and r into consideration, several quantities can be identified: Earliest Input Time EIT: EITp(r) = the lowest boundary of the timestamp value (measured in units of simulation time) of a message that the logical process LPp can receive in the physical time interval (r, ); Earliest Output Time EOT: EOTp(r) = the lowest boundary of the timestamp value (measured in units of simulation time) of a message, that the logical process LPp can send in the physical time interval (r, ); Earliest Conditional Output Time ECOT: ECOTp(r) = the lowest boundary of the timestamp value (measured in units of simulation time) of a message, that the logical process LPp can send in the physical time interval (r, ), with the assumption that LPp will receive no messages in the given physical time interval. Lookahead: lap(r) = the lowest boundary of the time, after which LPp will send a message to another logical process. The most common method used to advance EIT (i.e. to synchronize) is the usage of Null messages, via the Null Message algorithm. For EITp to be increased it is sufficient that the respective LPp sends a Null message to every other LP in its destset (a vector array of logical processes that LPp can send messages to) on every change in its EOT. Every logical process calculates its own EIT as the minimum EOT value of the most recent EOT values received via source-set (a vector array of logical processes that LPp can receive messages from). 3 BBN Butterfly OMNeT++ Simulation Model and Result Analysis 3.1 Sequential model A network simulation model for sequential execution is implemented in OMNeT++ with the BBN Butterfly network topology. The routing algorithm used is destination tag (DTR). DTR is a routing algorithm that determines the port, to which a switch has to re-route the received packages using only the destination address. This algorithm is typical for omega, butterfly and other multistage networks. This routing algorithm is highly dependable on the network topology, in which it is working the nodes are addressed in a definite way. The address of a host is divided into n (the number of the levels) equal parts, each of them is corresponding to the level and has to be big enough to sum k/2 in binary. If the address is not big enough, it is padded with zeros to the most significant part. Three traffic patterns are simulated: Uniform, Bit reversal and Transpose. Uniform traffic pattern addressing a packet from a certain node of the network is made randomly. The probability to forward the packet to each node (excluding itself) is equal. Bit-reversal traffic pattern addressing a packet from a certain node of the network is made depending packet's own address, each node sends only to address that is bit reversal of the sender s address. Matrix transpose addressing a packet from a certain node, each node sends messages only to a destination with the upper and lower halves of its own address transposed. ISBN:
3 control communication channel prevents A from sending to B more flits that B 's buffer could accept for the respective port. Sending credits from a switch to nodes connected to its input ports is implemented using control communication channels out_credit[0..4], matching every output data channel in[0..4]. 3.2 Parallel model Parallelization of the BBN Butterfly topology model is a process of transformation of that model from a sequential execution implementation to a parallel execution implementation where a simulation can be run on many nodes of a given computer cluster by means of message-passing and node-synchronization algorithms. [10] Fig.1: BBN OMNET++ Sequential Model Simulation is executed for three different values of packet size: 32 flits, 64 flits and 128 flits. Flit size is 16 bits. Ten values for offered traffic (in per cent of capacity) are simulated from 10 to 100% in 10% increments. Every host sends 1000 packets. The topology itself is 4-ary (3+1)-fly, consisting of 4 stages, where one stage is an extra stage. [1] The extra stage helps to increase the performance of the interconnection network when traffic patterns causing competition for a given channel are employed, by implementing 4 different paths from every sending host (source) to every receiving host (destination). The topology consists of 128 nodes, 64 of which are Radix-4 crossbar, and 64 terminal (hosts). The network is generally a switch that connects all inputs to all outputs and topologically is consists of a number of overlapping trees. [3] Nodes in the simulated network are connected by b=1gbps unidirectional communication channels with a delay of 3.3ns. Radix-4 switches are buffered and the flow control is credit-based. Credit-based flow control allows switches to prevent rejection of incoming flits due to a full buffer, thus optimizing the performance of the network. Switches inform every node connected with one of their input ports about the availability of buffer space for the respective port by sending credits. Every credit sent informs the node, connected with a given input port of the switch, that 1-flit of buffer space is available for the respective port. The feedback that node A receives from switch B by means of the number of received credits via the Four IBM Blade Center nodes are used for running simulations. All nodes have OMNeT++ Version 4.2.1, Build id: e2a29 and MPICH2 Version installed. The parallel programming interface MPI, implemented by Argonne National Lab (MPICH2 for Windows), is used by OMNeT++ via the cmpicommunications class as a mechanism to pass messages between cluster nodes. The conservative synchronization protocol Null Message Protocol (the cnullmessageprotocol OMNeT++ class) is used for message synchronization. The sequential simulation model is used as a fundament on which the parallel model is built. That is achieved by the creation of several new components and the modification of existing ones. Fig.2: BBN OMNET++ Parallel Model ISBN:
4 index in the vector array of 16 host components for the partition, and ownindex is the partition's index. Fig.3: BBN Butterfly OMNET++ Sequence Charts End Event Logs Parallel simulation requires that message processing in the network defined in the sequential model be divided in 4 partitions. To achieve loose coupling between partitions a bisection is performed on the network until it is divided in 4, Fig.2. Every partition has an identical component structure and component interconnection with that of the first 16 switches and first 16 hosts of the network, with a few differences. In other words, one partition is a network description component (Network Description File) named SubNet. That component contains 16 elements of type host and 16 elements of type switch, or ¼ of the total number of switches and hosts in the network (64 switches and 64 hosts). The interconnection of these elements is analogical to that between the first 16 switches and first 16 hosts of the network, with few key differences: a. Switch and host addresses in a given partition are a function of the partition's index, which leads to a uniqueness and full coverage of the address ranges of those components in the network (addresses from 0 to 63 are given both to switches and hosts). The component addresses must be differentiated from the component indexes, where the former range from 0 to 63 and the latter range from 0 to 15 for every partition. Taking the partition's index in consideration, the switch address is calculated using the formula: self_address = ((index%4)+(index- (index%4))*4+4*ownindex), where self_address is the switch address, index is the switch's index in the vector array of 16 switch components for the partition, and ownindex is the partition's index. Host address is calculated using the formula: self_address = index + ownindex*numhosts, where self_address is the host's address, index is the host's b. Part of the interconnection between switches from stage 1 and stage 2 of the BBN Butterfly topology is made outside of the SubNet component, because some connections between stage 1 and stage 2 switches are made between different partitions, in order to fully implement the network. Internal and external switch interconnections may be defined based on whether the interconnection is defined inside or outside the partition component SubNet. Internal for the partition is that part of the connections between stage 1 and stage 2 switches, for which the sending switch and the receiving switch are present as components, with regard to their addresses, in the same partition. External for the partition is that part of connections between stage 1 and stage 2 switches that is not included in the internal part of connections. Internal connections are always 8 in number, but switch port indexes, through which the connections are made, are defined as a function of the partition's index: the port index (0, 1, 2 or 3) is equal to the partition's index. In contrast, external connections for a given partition are made through 3 of 4 ports of every one of the stage 1 and stage 2 switches, whose index does not equal the partition's index. The dependency of the external connection's port index on the partition index is modelled in the SubNet component using conditional links (A[switch output port index]-->b[gate output index] if partition's index = x). The link between A and B is only implemented if the condition is true. This enables the SubNet component to be used for all partitions, despite the port index value dependencies on partition index values. A gate denotes a link to a component that is external to the current component (in this case the component is the partition SubNet). A higher level component is used to define connections between gates thus implementing inter-partition connections and realizing the whole parallel model. In MPI communications there can be no global variables that can be used for communication between partitions. Thus, for each partition to read the total number of packets received in the network (reaching a predefined number of sent/received packets is a condition for successful simulation termination) a monitoring system must be implemented. That system is called PartSync. It monitors the total number of received packets in the ISBN:
5 network and terminates the simulation with success when that number reaches a preconfigured value. PartSync uses message passing, considering MPI constraints, and implements a ring topology to send a message from a partition that just received a new packet to all other partitions in that way informing them of the increase of the total number of received packets. PartSync synchronization messages use custom OMNeT++.msg component PartSyncMsg created with the opp_msgc utility. PartSync messages are analogical to Null messages with the difference that the former transfer data about the sum of received packets in the sending partition and the latter are used for time synchronization between partitions. 3.2 Result Analysis The simulation experiment framework is targeted to evaluate the parallel performance of OMNET++ BBN Butterfly Interconnection Network for High- Performance Computer Clusters. The simulation models, implemented in C++, run on the following configuration: Software platform: OMNeT++ running on Windows Server R2 64-bit OS using the GCC for OMNeT++ Tool chain; Hardware platform: IBM Blade Center - HS22 Blade Servers; High-Performance and GRID Computing Laboratory located at Computer Systems Department, Technical University of Sofia. The experiments imply three different traffic patterns: Uniform, Bit reversal and Transpose and three different values of packet size: 32 flits, 64 flits and 128 flits. Thus, nine configurations of parallel execution are conducted for five different offered traffic levels (20% to 100% with 20% increments), Fig.4. Experimental data indicate that the parallel execution speedup increases with increasing the offered load (percent of capacity). Also, the speedup for uniform traffic pattern is greater compared to other communication traffic patterns. The results performed experimentally determine the maximum speedup of 2.76 for parallel discrete event-based simulation of BBN Butterfly Interconnection Network where offered traffic is 100% and the traffic pattern is uniform. 4 Conclusion In this paper we have presented the evaluation of parallel performance of BBN Butterfly Interconnection Network. The parallel performance are evaluated on the basis of parallel simulation models which have been run on IBM HS22 Blade Center for the case studies of several most popular communication patterns: Uniform, Bit reversal and Transpose and for three different values of packet size: 32 flits, 64 flits and 128 flits. Fig.4: Speedup Results of Parallel Execution This paper described an approach for designing parallel models in OMNeT++. The suggested BBN Butterfly interconnection network simulation model was designed to work in a parallel discrete event simulation (PDES) environment. Any network, composed in this way, can be simulated in a completely parallel manner to exploit the needed computational resources in order to simulate more complex network designs, connecting a large number of nodes. ISBN:
6 This approach can be used as a methodology to develop more complex designs. Empirical simulation data confirms features of the OMNET++ parallel performance described in [5]. ACKNOWLEDGEMENT The results reported in this paper are part of a research project DRNF 02/9-2009, supported by the National Science Fund, Bulgarian Ministry of Education and Science. References: [1] Dally W. J., Towels B.: Principles and practices of Interconnection Networks, Morgan Kaufmann, ISBN-13: , 2004 [2] James Milano Gary L. Mullen-Schultz, Gary Lakner: BlueGene-red book: Blue Gene: Hardware Overview and Planning [3] P. Borovska. Computer systems. Sofia; Bulgaria: Ciela, ISBN (in Bulgarian), [4] Duato, J., Yalamanchili, S., Lionel M. Interconnection networks: An engineering approach, Morgan Kaufmann Publishers, ISBN , [5] OMNET++ Discrete Event Simulation Environment: [6] Pl. Borovska, O. Nakov, D. Ivanova, K. Ivanov, G. Georgiev: Communication Performance Evaluation and Analysis of a Mesh System Area Network for High Performance Computers. 12-th WSEAS International Conference on Mathematical Methods, Computational Techniques and Intelligence Systems (MAMECTIS 10), Kantaoui, Sousse, Tunisia, May 3-6, 2010, ISBN: , pp [7] Plamenka Borovska, Desislava Ivanova, Venelina Ianakieva, Vladislav Mitov, Halil Alkaf: Comparative Analysis of Communication Performance Evaluation for Butterfly Bidirectional Multistage Interconnection Network Topology with Routing Table and Destination Tag Routing, Sixth International Scientific Conference Computer Science 2011, Ohrid, Macedonia, pp , September 2011 [8] D. Wu, E. Wu, J. Lai, A. Varga, Y. A. Sekercioglu, G. K. Egan, Implementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework, Proceedings 14th European Simulation Symposium A. Verbraeck, W. Krug, eds. (c) SCS Europe BVBA, 2002 [9] R. L. Bagrodia, M. Takai, V. Jha, Performance evaluation of conservative algorithms in parallel simulation languages, Parallel and Distributed Systems, IEEE Transactions on, pages , Apr [10] D. Wu, E. Wu, J. Lai, A. Varga, Y. A. Sekercioglu, G. K. Egan, Implementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework, Proceedings 14th European Simulation Symposium A. Verbraeck, W. Krug, eds. (c) SCS Europe BVBA, ISBN:
Performance Evaluation of TOFU System Area Network Design for High- Performance Computer Systems
Performance Evaluation of TOFU System Area Network Design for High- Performance Computer Systems P. BOROVSKA, O. NAKOV, S. MARKOV, D. IVANOVA, F. FILIPOV Computer System Department Technical University
More informationWP2: Multiprocessors communication networks for PetaFLOPS supercomputers 1. Main activities and results Task 2.1: System Area Network Topology
WP2: Multiprocessors communication networks for PetaFLOPS supercomputers 1. Main activities and results Task 2.1: System Area Network Topology Design: network simulation and communication performance evaluation.
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationDistributed simulation with MPI in ns-3. Josh Pelkey Dr. George Riley
Distributed simulation with MPI in ns-3 Josh Pelkey Dr. George Riley Overview Parallel and distributed discrete event simulation [1] Allows single simulation program to run on multiple interconnected processors
More informationBasic Low Level Concepts
Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock
More informationEnabling Distributed Simulation of OMNeT++ INET Models
Enabling Distributed Simulation of OMNeT++ INET Models Mirko Stoffers, Ralf Bettermann, James Gross, Klaus Wehrle Communication and Distributed Systems, RWTH Aachen University School of Electrical Engineering,
More informationHomework Assignment #1: Topology Kelly Shaw
EE482 Advanced Computer Organization Spring 2001 Professor W. J. Dally Homework Assignment #1: Topology Kelly Shaw As we have not discussed routing or flow control yet, throughout this problem set assume
More informationChapter 4 : Butterfly Networks
1 Chapter 4 : Butterfly Networks Structure of a butterfly network Isomorphism Channel load and throughput Optimization Path diversity Case study: BBN network 2 Structure of a butterfly network A K-ary
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationSolving the Travelling Salesman Problem in Parallel by Genetic Algorithm on Multicomputer Cluster
Solving the Travelling Salesman Problem in Parallel by Genetic Algorithm on Multicomputer Cluster Plamenka Borovska Abstract: The paper investigates the efficiency of the parallel computation of the travelling
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationMeasuring the Efficiency of Parallel Discrete Event Simulation in Heterogeneous Execution Environments
Acta Technica Jaurinensis Vol. X, No. Y, pp. xx-yy, 20xy DOI: 10.14513/actatechjaur.vX.nY.000 Available online at acta.sze.hu Measuring the Efficiency of Parallel Discrete Event Simulation in Heterogeneous
More informationInterconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationArchitecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting
Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Natawut Nupairoj and Lionel M. Ni Department of Computer Science Michigan State University East Lansing,
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More informationECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts
ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running
More informationINTERCONNECTION NETWORKS LECTURE 4
INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source
More informationCS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2
Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationUnder the Hood, Part 1: Implementing Message Passing
Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 Today s Theme 2 Message passing model (abstraction) Threads
More informationTopology basics. Constraints and measures. Butterfly networks.
EE48: Advanced Computer Organization Lecture # Interconnection Networks Architecture and Design Stanford University Topology basics. Constraints and measures. Butterfly networks. Lecture #: Monday, 7 April
More informationDistributed Simulation of Large Computer Systems
Distributed Simulation of Large Computer Systems Moreno Marzolla Univ. di Venezia Ca Foscari Dept. of Computer Science and INFN Padova Email: marzolla@dsi.unive.it Web: www.dsi.unive.it/ marzolla Moreno
More informationARCS: A SIMULATOR FOR DISTRIBUTED SENSOR NETWORKS
ARCS: A SIMULATOR FOR DISTRIBUTED SENSOR NETWORKS Zdravko Georgiev Karakehayov University of Southern Denmark, Mads Clausen Institute, Grundtvigs Alle 150, DK-6400, Sønderborg, Denmark, phone: +45 6550
More informationImplementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework
Implementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework David Wu Eric Wu Johnny Lai Andràs Varga Y. Ahmet Şekercioğlu Gregory K. Egan Centre for Telecommunication
More informationLecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)
Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew
More informationLecture 3: Topology - II
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and
More informationCS575 Parallel Processing
CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationButterfly vs. Unidirectional Fat-Trees for Networks-on-Chip: not a Mere Permutation of Outputs
Butterfly vs. Unidirectional Fat-Trees for Networks-on-Chip: not a Mere Permutation of Outputs D. Ludovici, F. Gilabert, C. Gómez, M.E. Gómez, P. López, G.N. Gaydadjiev, and J. Duato Dept. of Computer
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationLookahead Accumulation in Conservative Parallel Discrete Event Simulation.
Lookahead Accumulation in Conservative Parallel Discrete Event Simulation. Jan Lemeire, Wouter Brissinck, Erik Dirkx Parallel Systems lab, Vrije Universiteit Brussel (VUB) Brussels, Belgium {jlemeire,
More informationCS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.
CS4961 Parallel Programming Lecture 4: Memory Systems and Interconnects Administrative Nikhil office hours: - Monday, 2-3PM - Lab hours on Tuesday afternoons during programming assignments First homework
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:
More informationOFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management
Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly
More informationLecture 2: Topology - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and
More informationParallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle
Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle Plamenka Borovska Abstract: The paper investigates the efficiency of parallel branch-and-bound search on multicomputer cluster for the
More informationNetwork-on-chip (NOC) Topologies
Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance
More informationThe Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns
The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering
More information6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP
LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃLPHÃIRUÃDÃSDFHLPH $GDSWLYHÃURFHVVLQJÃ$OJRULWKPÃRQÃDÃDUDOOHOÃ(PEHGGHG \VWHP Jack M. West and John K. Antonio Department of Computer Science, P.O. Box, Texas Tech University,
More informationSpider-Web Topology: A Novel Topology for Parallel and Distributed Computing
Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing 1 Selvarajah Thuseethan, 2 Shanmuganathan Vasanthapriyan 1,2 Department of Computing and Information Systems, Sabaragamuwa University
More informationPhastlane: A Rapid Transit Optical Routing Network
Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens
More informationThree basic multiprocessing issues
Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationLecture: Interconnection Networks
Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet
More informationInterconnection Networks. Issues for Networks
Interconnection Networks Communications Among Processors Chris Nevison, Colgate University Issues for Networks Total Bandwidth amount of data which can be moved from somewhere to somewhere per unit time
More informationA Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks
A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks Xuan-Yi Lin, Yeh-Ching Chung, and Tai-Yi Huang Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan 00, ROC
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationRecall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms
CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
More informationCS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control
CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed
More informationRouting Algorithms. Review
Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent
More informationDesign and Implementation of Multistage Interconnection Networks for SoC Networks
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 212 Design and Implementation of Multistage Interconnection Networks for SoC Networks Mahsa
More informationDEVELOPMENT OF PARAMETERIZED CELL OF SPIRAL INDUCTOR USING SKILL LANGUAGE
DEVELOPMENT OF PARAMETERIZED CELL OF SPIRAL INDUCTOR USING SKILL LANGUAGE Vladimir Emilov Grozdanov 1, Diana Ivanova Pukneva 1, Marin Hristov Hristov 2 1 Smartcom, 7 th km, Tzarigradsko Chausee Blvd, 1784
More informationCommunication Performance in Network-on-Chips
Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In
More informationIMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS
IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS M. Thoppian, S. Venkatesan, H. Vu, R. Prakash, N. Mittal Department of Computer Science, The University of Texas at
More informationThis chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research
CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks
More informationParallel Computing Platforms
Parallel Computing Platforms Network Topologies John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 14 28 February 2017 Topics for Today Taxonomy Metrics
More informationParallel and Distributed VHDL Simulation
Parallel and Distributed VHDL Simulation Dragos Lungeanu Deptartment of Computer Science University of Iowa C.J. chard Shi Department of Electrical Engineering University of Washington Abstract This paper
More informationComparative Study of blocking mechanisms for Packet Switched Omega Networks
Proceedings of the 6th WSEAS Int. Conf. on Electronics, Hardware, Wireless and Optical Communications, Corfu Island, Greece, February 16-19, 2007 18 Comparative Study of blocking mechanisms for Packet
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More informationEfficiency and Quality of Solution of Parallel Simulated Annealing
Proceedings of the 11th WSEAS International Conference on SYSTEMS, Agios Nikolaos, Crete Island, Greece, July 23-2, 27 177 Efficiency and Quality of Solution of Parallel Simulated Annealing PLAMENKA BOROVSKA,
More informationChapter 3 : Topology basics
1 Chapter 3 : Topology basics What is the network topology Nomenclature Traffic pattern Performance Packaging cost Case study: the SGI Origin 2000 2 Network topology (1) It corresponds to the static arrangement
More informationFundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.
Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing
More informationDesign of a System-on-Chip Switched Network and its Design Support Λ
Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of
More informationOASIS Network-on-Chip Prototyping on FPGA
Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of
More informationIntroduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2
Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.
More informationPARALLEL QUEUING NETWORK SIMULATION WITH LOOKBACK- BASED PROTOCOLS
PARALLEL QUEUING NETWORK SIMULATION WITH LOOKBACK- BASED PROTOCOLS Gilbert G. Chen and Boleslaw K. Szymanski Department of Computer Science, Rensselaer Polytechnic Institute 110 Eighth Street, Troy, NY
More informationCS Parallel Algorithms in Scientific Computing
CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan
More informationParallel Computing Interconnection Networks
Parallel Computing Interconnection Networks Readings: Hager s book (4.5) Pacheco s book (chapter 2.3.3) http://pages.cs.wisc.edu/~tvrdik/5/html/section5.html#aaaaatre e-based topologies Slides credit:
More informationDr e v prasad Dt
Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction
More informationUltra-Fast NoC Emulation on a Single FPGA
The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo
More informationA Novel Energy Efficient Source Routing for Mesh NoCs
2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony
More informationBoundary Recognition in Sensor Networks. Ng Ying Tat and Ooi Wei Tsang
Boundary Recognition in Sensor Networks Ng Ying Tat and Ooi Wei Tsang School of Computing, National University of Singapore ABSTRACT Boundary recognition for wireless sensor networks has many applications,
More informationCS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2
CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann
More informationProcessor Architecture and Interconnect
Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing
More informationInitial studies of SCI LAN topologies for local area clustering
Prepared for the First International Workshop on SCI-Based Low-Cost/High-Performance Computing, Santa Clara University Initial studies of SCI LAN topologies for local area clustering Haakon Bryhni * and
More informationEVENT DRIVEN PACKET SIMULATOR
EVENT DRIVEN PACKET SIMULATOR Nikolay Georgiev Chillev,Vassiliy Platonovitch Tchoumatchenko*,Tania Krumova Vassileva* Department of Computer Science,*Department of Electronics, Technical University of
More informationCOMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS
International Journal of Computer Engineering and Applications, Volume VII, Issue II, Part II, COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS Sanjukta
More informationSIMULATIONS. PACE Lab, Rockwell Collins, IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK
IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS M. Thoppian, S. Venkatesan, H. Vu, R. Prakash, N. Mittal Department of Computer Science, The University of Texas at
More informationObjective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.
CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes
More informationBARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs
-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The
More informationA Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing
727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni
More informationVirtual Prototyping and Performance Analysis of RapidIO-based System Architectures for Space-Based Radar
Virtual Prototyping and Performance Analysis of RapidIO-based System Architectures for Space-Based Radar David Bueno, Adam Leko, Chris Conger, Ian Troxel, and Alan D. George HCS Research Laboratory College
More informationA Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ
A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino
More informationPARALLEL SIMULATION MADE EASY WITH OMNeT++
PARALLEL SIMULATION MADE EASY WITH OMNeT++!"$#&%('% ) '(*,+- /.0'% Centre for Telecommunication and Information Engineering, Monash University, Melbourne, Australia Omnest Global Inc., Budapest, Hungary
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationBandwidth Aware Routing Algorithms for Networks-on-Chip
1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering
More informationSlim Fly: A Cost Effective Low-Diameter Network Topology
TORSTEN HOEFLER, MACIEJ BESTA Slim Fly: A Cost Effective Low-Diameter Network Topology Images belong to their creator! NETWORKS, LIMITS, AND DESIGN SPACE Networks cost 25-30% of a large supercomputer Hard
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationPerformance Analysis of Interconnection Networks for Packet Delay using Source Routing
pecial Issue of International Journal of Computer Applications (0975 8887) Performance Analysis of Interconnection Networks for Packet Delay using ource Routing Lalit Kishore Arora Ajay Kumar Garg Engg
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationScalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
More informationA Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors
Proceedings of the World Congress on Engineering 2018 ol I A Routing Algorithm for 3 Network-on-Chip in Chip Multi-Processors Rui Ben, Fen Ge, intian Tong, Ning Wu, ing hang, and Fang hou Abstract communication
More informationA Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems
A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems Zhenjiang Dong, Jun Wang, George Riley, Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationHigh Performance Computing. University questions with solution
High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The
More informationMIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer
MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware
More informationParallel Implementation of 3D FMA using MPI
Parallel Implementation of 3D FMA using MPI Eric Jui-Lin Lu y and Daniel I. Okunbor z Computer Science Department University of Missouri - Rolla Rolla, MO 65401 Abstract The simulation of N-body system
More informationEE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1
EE382C Lecture 1 Bill Dally 3/29/11 EE 382C - S11 - Lecture 1 1 Logistics Handouts Course policy sheet Course schedule Assignments Homework Research Paper Project Midterm EE 382C - S11 - Lecture 1 2 What
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More information