Latest Trends in Applied Informatics and Computing

Size: px
Start display at page:

Download "Latest Trends in Applied Informatics and Computing"

Transcription

1 Parallel Simulation and Communication Performance Evaluation of a Multistage BBN Butterfly Interconnection Network for High- Performance Computer Clusters PLAMENKA BOROVSKA, DESISLAVA IVANOVA, PAVEL TSVETANSKI Computer Systems Department Technical University of Sofia 8 Kliment Ohridski Boul., 1000 Sofia BULGARIA pborovska@tu-sofia.bg, d_ivanova@tu-sofia.bg, pavel_tsvetanski@tu-sofia.bg Abstract: The communication performance of multistage interconnection networks is a crucial factor influencing the parallel performance of high-performance computer clusters. In this paper we have proposed a methodology for parallelization of an OMNET++ sequential model. We designed in parallel manner a multistage BBN interconnect network topology to meet the demands of efficient and fast communication on high-performance computer systems. The parallel communication are evaluated on the basis of parallel simulation models using the simulation framework OMNET++ (MPI) that is run on IBM HS22 Blade Center at the High-Performance and GRID Computing Laboratory located at Computer Systems Department, Technical University of Sofia. Result analysis of parallel simulation results has been performed. Key-Words: High-Speed Interconnection Networks, BBN Network Architecture, OMNET++, Null Message Protocol, Parallel Simulations, Communication Performance Evaluation, Performance Analysis 1 Introduction Interconnection network architecture designs are influenced by next generation high-performance computer clusters and supercomputer technology. The path to next generation Tier-0 computer systems is increasingly dependent on designing computer clusters with hundreds and thousands of processors. The interconnection topology design of a parallel computer system is a critical factor in determining the computer performance. [1-4] Interconnection network designs vary with respect to communication parameters: throughput and latency and cost. Communication network performance determines computer cluster performance for many applications. Therefore, the choice of network architecture has a significant impact on computer performance and will affect the usability of a parallel computer cluster. Interconnection networks are composed of a set of shared switch nodes and links, and the network topology refers to the arrangement of these nodes and links. Selecting the network topology is the first and very important step in designing a network because the flow-control and routing algorithm depend heavily on the network topology design. The goal of this paper is to propose a methodology for parallelization of an OMNET++ sequential model and to evaluate the communication performance of a multistage BBN network design on the basis of program implementation on IBM HS22 Blade Center, located at the High- Performance and GRID Computing Laboratory, Technical University of Sofia. Communication performance of a BBN multistage topology is performed by means of network simulations using OMNET++. 2 OMNET++ Platform and Parallel Simulations OMNeT ++ is essentially a set of software tools and libraries that supports the development of simulation models. Most often OMNeT++ is used to develop models of computer networks and protocols. OMNeT++ represents a simulation environment, including specific libraries (simulation framework and library). It is built up of individual components called modules. Its main purpose is to be used for building network simulations of ad-hoc networks, wireless networks, communication networks and others. OMNeT++ includes Eclipse-based graphical development environment (IDE) and some ISBN:

2 additional tools to facilitate the work of the developers. [5] OMNeT++ also provides support for parallel simulation execution. Very large simulations may benefit from the parallel distributed simulation (PDES) feature, either by getting speedup, or by distributing memory requirements. [8] 2.1 Null Message Protocol OMNeT++ provides a Null Message protocol, which implements the Null Message conservative synchronization algorithm in a class called cnullmessageprotocol. The implementation of Null Message Protocol in OMNeT++ is based on the terminology defined in [8, 9]. Let LPp be the logical processes that a given parallel simulation model is composed of, where p is in the range [0, count of logical processes 1]. Let r be a moment in the physical time of a given simulation execution. Taking LPp and r into consideration, several quantities can be identified: Earliest Input Time EIT: EITp(r) = the lowest boundary of the timestamp value (measured in units of simulation time) of a message that the logical process LPp can receive in the physical time interval (r, ); Earliest Output Time EOT: EOTp(r) = the lowest boundary of the timestamp value (measured in units of simulation time) of a message, that the logical process LPp can send in the physical time interval (r, ); Earliest Conditional Output Time ECOT: ECOTp(r) = the lowest boundary of the timestamp value (measured in units of simulation time) of a message, that the logical process LPp can send in the physical time interval (r, ), with the assumption that LPp will receive no messages in the given physical time interval. Lookahead: lap(r) = the lowest boundary of the time, after which LPp will send a message to another logical process. The most common method used to advance EIT (i.e. to synchronize) is the usage of Null messages, via the Null Message algorithm. For EITp to be increased it is sufficient that the respective LPp sends a Null message to every other LP in its destset (a vector array of logical processes that LPp can send messages to) on every change in its EOT. Every logical process calculates its own EIT as the minimum EOT value of the most recent EOT values received via source-set (a vector array of logical processes that LPp can receive messages from). 3 BBN Butterfly OMNeT++ Simulation Model and Result Analysis 3.1 Sequential model A network simulation model for sequential execution is implemented in OMNeT++ with the BBN Butterfly network topology. The routing algorithm used is destination tag (DTR). DTR is a routing algorithm that determines the port, to which a switch has to re-route the received packages using only the destination address. This algorithm is typical for omega, butterfly and other multistage networks. This routing algorithm is highly dependable on the network topology, in which it is working the nodes are addressed in a definite way. The address of a host is divided into n (the number of the levels) equal parts, each of them is corresponding to the level and has to be big enough to sum k/2 in binary. If the address is not big enough, it is padded with zeros to the most significant part. Three traffic patterns are simulated: Uniform, Bit reversal and Transpose. Uniform traffic pattern addressing a packet from a certain node of the network is made randomly. The probability to forward the packet to each node (excluding itself) is equal. Bit-reversal traffic pattern addressing a packet from a certain node of the network is made depending packet's own address, each node sends only to address that is bit reversal of the sender s address. Matrix transpose addressing a packet from a certain node, each node sends messages only to a destination with the upper and lower halves of its own address transposed. ISBN:

3 control communication channel prevents A from sending to B more flits that B 's buffer could accept for the respective port. Sending credits from a switch to nodes connected to its input ports is implemented using control communication channels out_credit[0..4], matching every output data channel in[0..4]. 3.2 Parallel model Parallelization of the BBN Butterfly topology model is a process of transformation of that model from a sequential execution implementation to a parallel execution implementation where a simulation can be run on many nodes of a given computer cluster by means of message-passing and node-synchronization algorithms. [10] Fig.1: BBN OMNET++ Sequential Model Simulation is executed for three different values of packet size: 32 flits, 64 flits and 128 flits. Flit size is 16 bits. Ten values for offered traffic (in per cent of capacity) are simulated from 10 to 100% in 10% increments. Every host sends 1000 packets. The topology itself is 4-ary (3+1)-fly, consisting of 4 stages, where one stage is an extra stage. [1] The extra stage helps to increase the performance of the interconnection network when traffic patterns causing competition for a given channel are employed, by implementing 4 different paths from every sending host (source) to every receiving host (destination). The topology consists of 128 nodes, 64 of which are Radix-4 crossbar, and 64 terminal (hosts). The network is generally a switch that connects all inputs to all outputs and topologically is consists of a number of overlapping trees. [3] Nodes in the simulated network are connected by b=1gbps unidirectional communication channels with a delay of 3.3ns. Radix-4 switches are buffered and the flow control is credit-based. Credit-based flow control allows switches to prevent rejection of incoming flits due to a full buffer, thus optimizing the performance of the network. Switches inform every node connected with one of their input ports about the availability of buffer space for the respective port by sending credits. Every credit sent informs the node, connected with a given input port of the switch, that 1-flit of buffer space is available for the respective port. The feedback that node A receives from switch B by means of the number of received credits via the Four IBM Blade Center nodes are used for running simulations. All nodes have OMNeT++ Version 4.2.1, Build id: e2a29 and MPICH2 Version installed. The parallel programming interface MPI, implemented by Argonne National Lab (MPICH2 for Windows), is used by OMNeT++ via the cmpicommunications class as a mechanism to pass messages between cluster nodes. The conservative synchronization protocol Null Message Protocol (the cnullmessageprotocol OMNeT++ class) is used for message synchronization. The sequential simulation model is used as a fundament on which the parallel model is built. That is achieved by the creation of several new components and the modification of existing ones. Fig.2: BBN OMNET++ Parallel Model ISBN:

4 index in the vector array of 16 host components for the partition, and ownindex is the partition's index. Fig.3: BBN Butterfly OMNET++ Sequence Charts End Event Logs Parallel simulation requires that message processing in the network defined in the sequential model be divided in 4 partitions. To achieve loose coupling between partitions a bisection is performed on the network until it is divided in 4, Fig.2. Every partition has an identical component structure and component interconnection with that of the first 16 switches and first 16 hosts of the network, with a few differences. In other words, one partition is a network description component (Network Description File) named SubNet. That component contains 16 elements of type host and 16 elements of type switch, or ¼ of the total number of switches and hosts in the network (64 switches and 64 hosts). The interconnection of these elements is analogical to that between the first 16 switches and first 16 hosts of the network, with few key differences: a. Switch and host addresses in a given partition are a function of the partition's index, which leads to a uniqueness and full coverage of the address ranges of those components in the network (addresses from 0 to 63 are given both to switches and hosts). The component addresses must be differentiated from the component indexes, where the former range from 0 to 63 and the latter range from 0 to 15 for every partition. Taking the partition's index in consideration, the switch address is calculated using the formula: self_address = ((index%4)+(index- (index%4))*4+4*ownindex), where self_address is the switch address, index is the switch's index in the vector array of 16 switch components for the partition, and ownindex is the partition's index. Host address is calculated using the formula: self_address = index + ownindex*numhosts, where self_address is the host's address, index is the host's b. Part of the interconnection between switches from stage 1 and stage 2 of the BBN Butterfly topology is made outside of the SubNet component, because some connections between stage 1 and stage 2 switches are made between different partitions, in order to fully implement the network. Internal and external switch interconnections may be defined based on whether the interconnection is defined inside or outside the partition component SubNet. Internal for the partition is that part of the connections between stage 1 and stage 2 switches, for which the sending switch and the receiving switch are present as components, with regard to their addresses, in the same partition. External for the partition is that part of connections between stage 1 and stage 2 switches that is not included in the internal part of connections. Internal connections are always 8 in number, but switch port indexes, through which the connections are made, are defined as a function of the partition's index: the port index (0, 1, 2 or 3) is equal to the partition's index. In contrast, external connections for a given partition are made through 3 of 4 ports of every one of the stage 1 and stage 2 switches, whose index does not equal the partition's index. The dependency of the external connection's port index on the partition index is modelled in the SubNet component using conditional links (A[switch output port index]-->b[gate output index] if partition's index = x). The link between A and B is only implemented if the condition is true. This enables the SubNet component to be used for all partitions, despite the port index value dependencies on partition index values. A gate denotes a link to a component that is external to the current component (in this case the component is the partition SubNet). A higher level component is used to define connections between gates thus implementing inter-partition connections and realizing the whole parallel model. In MPI communications there can be no global variables that can be used for communication between partitions. Thus, for each partition to read the total number of packets received in the network (reaching a predefined number of sent/received packets is a condition for successful simulation termination) a monitoring system must be implemented. That system is called PartSync. It monitors the total number of received packets in the ISBN:

5 network and terminates the simulation with success when that number reaches a preconfigured value. PartSync uses message passing, considering MPI constraints, and implements a ring topology to send a message from a partition that just received a new packet to all other partitions in that way informing them of the increase of the total number of received packets. PartSync synchronization messages use custom OMNeT++.msg component PartSyncMsg created with the opp_msgc utility. PartSync messages are analogical to Null messages with the difference that the former transfer data about the sum of received packets in the sending partition and the latter are used for time synchronization between partitions. 3.2 Result Analysis The simulation experiment framework is targeted to evaluate the parallel performance of OMNET++ BBN Butterfly Interconnection Network for High- Performance Computer Clusters. The simulation models, implemented in C++, run on the following configuration: Software platform: OMNeT++ running on Windows Server R2 64-bit OS using the GCC for OMNeT++ Tool chain; Hardware platform: IBM Blade Center - HS22 Blade Servers; High-Performance and GRID Computing Laboratory located at Computer Systems Department, Technical University of Sofia. The experiments imply three different traffic patterns: Uniform, Bit reversal and Transpose and three different values of packet size: 32 flits, 64 flits and 128 flits. Thus, nine configurations of parallel execution are conducted for five different offered traffic levels (20% to 100% with 20% increments), Fig.4. Experimental data indicate that the parallel execution speedup increases with increasing the offered load (percent of capacity). Also, the speedup for uniform traffic pattern is greater compared to other communication traffic patterns. The results performed experimentally determine the maximum speedup of 2.76 for parallel discrete event-based simulation of BBN Butterfly Interconnection Network where offered traffic is 100% and the traffic pattern is uniform. 4 Conclusion In this paper we have presented the evaluation of parallel performance of BBN Butterfly Interconnection Network. The parallel performance are evaluated on the basis of parallel simulation models which have been run on IBM HS22 Blade Center for the case studies of several most popular communication patterns: Uniform, Bit reversal and Transpose and for three different values of packet size: 32 flits, 64 flits and 128 flits. Fig.4: Speedup Results of Parallel Execution This paper described an approach for designing parallel models in OMNeT++. The suggested BBN Butterfly interconnection network simulation model was designed to work in a parallel discrete event simulation (PDES) environment. Any network, composed in this way, can be simulated in a completely parallel manner to exploit the needed computational resources in order to simulate more complex network designs, connecting a large number of nodes. ISBN:

6 This approach can be used as a methodology to develop more complex designs. Empirical simulation data confirms features of the OMNET++ parallel performance described in [5]. ACKNOWLEDGEMENT The results reported in this paper are part of a research project DRNF 02/9-2009, supported by the National Science Fund, Bulgarian Ministry of Education and Science. References: [1] Dally W. J., Towels B.: Principles and practices of Interconnection Networks, Morgan Kaufmann, ISBN-13: , 2004 [2] James Milano Gary L. Mullen-Schultz, Gary Lakner: BlueGene-red book: Blue Gene: Hardware Overview and Planning [3] P. Borovska. Computer systems. Sofia; Bulgaria: Ciela, ISBN (in Bulgarian), [4] Duato, J., Yalamanchili, S., Lionel M. Interconnection networks: An engineering approach, Morgan Kaufmann Publishers, ISBN , [5] OMNET++ Discrete Event Simulation Environment: [6] Pl. Borovska, O. Nakov, D. Ivanova, K. Ivanov, G. Georgiev: Communication Performance Evaluation and Analysis of a Mesh System Area Network for High Performance Computers. 12-th WSEAS International Conference on Mathematical Methods, Computational Techniques and Intelligence Systems (MAMECTIS 10), Kantaoui, Sousse, Tunisia, May 3-6, 2010, ISBN: , pp [7] Plamenka Borovska, Desislava Ivanova, Venelina Ianakieva, Vladislav Mitov, Halil Alkaf: Comparative Analysis of Communication Performance Evaluation for Butterfly Bidirectional Multistage Interconnection Network Topology with Routing Table and Destination Tag Routing, Sixth International Scientific Conference Computer Science 2011, Ohrid, Macedonia, pp , September 2011 [8] D. Wu, E. Wu, J. Lai, A. Varga, Y. A. Sekercioglu, G. K. Egan, Implementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework, Proceedings 14th European Simulation Symposium A. Verbraeck, W. Krug, eds. (c) SCS Europe BVBA, 2002 [9] R. L. Bagrodia, M. Takai, V. Jha, Performance evaluation of conservative algorithms in parallel simulation languages, Parallel and Distributed Systems, IEEE Transactions on, pages , Apr [10] D. Wu, E. Wu, J. Lai, A. Varga, Y. A. Sekercioglu, G. K. Egan, Implementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework, Proceedings 14th European Simulation Symposium A. Verbraeck, W. Krug, eds. (c) SCS Europe BVBA, ISBN:

Performance Evaluation of TOFU System Area Network Design for High- Performance Computer Systems

Performance Evaluation of TOFU System Area Network Design for High- Performance Computer Systems Performance Evaluation of TOFU System Area Network Design for High- Performance Computer Systems P. BOROVSKA, O. NAKOV, S. MARKOV, D. IVANOVA, F. FILIPOV Computer System Department Technical University

More information

WP2: Multiprocessors communication networks for PetaFLOPS supercomputers 1. Main activities and results Task 2.1: System Area Network Topology

WP2: Multiprocessors communication networks for PetaFLOPS supercomputers 1. Main activities and results Task 2.1: System Area Network Topology WP2: Multiprocessors communication networks for PetaFLOPS supercomputers 1. Main activities and results Task 2.1: System Area Network Topology Design: network simulation and communication performance evaluation.

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Distributed simulation with MPI in ns-3. Josh Pelkey Dr. George Riley

Distributed simulation with MPI in ns-3. Josh Pelkey Dr. George Riley Distributed simulation with MPI in ns-3 Josh Pelkey Dr. George Riley Overview Parallel and distributed discrete event simulation [1] Allows single simulation program to run on multiple interconnected processors

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Enabling Distributed Simulation of OMNeT++ INET Models

Enabling Distributed Simulation of OMNeT++ INET Models Enabling Distributed Simulation of OMNeT++ INET Models Mirko Stoffers, Ralf Bettermann, James Gross, Klaus Wehrle Communication and Distributed Systems, RWTH Aachen University School of Electrical Engineering,

More information

Homework Assignment #1: Topology Kelly Shaw

Homework Assignment #1: Topology Kelly Shaw EE482 Advanced Computer Organization Spring 2001 Professor W. J. Dally Homework Assignment #1: Topology Kelly Shaw As we have not discussed routing or flow control yet, throughout this problem set assume

More information

Chapter 4 : Butterfly Networks

Chapter 4 : Butterfly Networks 1 Chapter 4 : Butterfly Networks Structure of a butterfly network Isomorphism Channel load and throughput Optimization Path diversity Case study: BBN network 2 Structure of a butterfly network A K-ary

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Solving the Travelling Salesman Problem in Parallel by Genetic Algorithm on Multicomputer Cluster

Solving the Travelling Salesman Problem in Parallel by Genetic Algorithm on Multicomputer Cluster Solving the Travelling Salesman Problem in Parallel by Genetic Algorithm on Multicomputer Cluster Plamenka Borovska Abstract: The paper investigates the efficiency of the parallel computation of the travelling

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Measuring the Efficiency of Parallel Discrete Event Simulation in Heterogeneous Execution Environments

Measuring the Efficiency of Parallel Discrete Event Simulation in Heterogeneous Execution Environments Acta Technica Jaurinensis Vol. X, No. Y, pp. xx-yy, 20xy DOI: 10.14513/actatechjaur.vX.nY.000 Available online at acta.sze.hu Measuring the Efficiency of Parallel Discrete Event Simulation in Heterogeneous

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Natawut Nupairoj and Lionel M. Ni Department of Computer Science Michigan State University East Lansing,

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2 Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Under the Hood, Part 1: Implementing Message Passing

Under the Hood, Part 1: Implementing Message Passing Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 Today s Theme 2 Message passing model (abstraction) Threads

More information

Topology basics. Constraints and measures. Butterfly networks.

Topology basics. Constraints and measures. Butterfly networks. EE48: Advanced Computer Organization Lecture # Interconnection Networks Architecture and Design Stanford University Topology basics. Constraints and measures. Butterfly networks. Lecture #: Monday, 7 April

More information

Distributed Simulation of Large Computer Systems

Distributed Simulation of Large Computer Systems Distributed Simulation of Large Computer Systems Moreno Marzolla Univ. di Venezia Ca Foscari Dept. of Computer Science and INFN Padova Email: marzolla@dsi.unive.it Web: www.dsi.unive.it/ marzolla Moreno

More information

ARCS: A SIMULATOR FOR DISTRIBUTED SENSOR NETWORKS

ARCS: A SIMULATOR FOR DISTRIBUTED SENSOR NETWORKS ARCS: A SIMULATOR FOR DISTRIBUTED SENSOR NETWORKS Zdravko Georgiev Karakehayov University of Southern Denmark, Mads Clausen Institute, Grundtvigs Alle 150, DK-6400, Sønderborg, Denmark, phone: +45 6550

More information

Implementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework

Implementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework Implementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework David Wu Eric Wu Johnny Lai Andràs Varga Y. Ahmet Şekercioğlu Gregory K. Egan Centre for Telecommunication

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Lecture 3: Topology - II

Lecture 3: Topology - II ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and

More information

CS575 Parallel Processing

CS575 Parallel Processing CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Butterfly vs. Unidirectional Fat-Trees for Networks-on-Chip: not a Mere Permutation of Outputs

Butterfly vs. Unidirectional Fat-Trees for Networks-on-Chip: not a Mere Permutation of Outputs Butterfly vs. Unidirectional Fat-Trees for Networks-on-Chip: not a Mere Permutation of Outputs D. Ludovici, F. Gilabert, C. Gómez, M.E. Gómez, P. López, G.N. Gaydadjiev, and J. Duato Dept. of Computer

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Lookahead Accumulation in Conservative Parallel Discrete Event Simulation.

Lookahead Accumulation in Conservative Parallel Discrete Event Simulation. Lookahead Accumulation in Conservative Parallel Discrete Event Simulation. Jan Lemeire, Wouter Brissinck, Erik Dirkx Parallel Systems lab, Vrije Universiteit Brussel (VUB) Brussels, Belgium {jlemeire,

More information

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont. CS4961 Parallel Programming Lecture 4: Memory Systems and Interconnects Administrative Nikhil office hours: - Monday, 2-3PM - Lab hours on Tuesday afternoons during programming assignments First homework

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle

Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle Plamenka Borovska Abstract: The paper investigates the efficiency of parallel branch-and-bound search on multicomputer cluster for the

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃLPHÃIRUÃDÃSDFHLPH $GDSWLYHÃURFHVVLQJÃ$OJRULWKPÃRQÃDÃDUDOOHOÃ(PEHGGHG \VWHP Jack M. West and John K. Antonio Department of Computer Science, P.O. Box, Texas Tech University,

More information

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing 1 Selvarajah Thuseethan, 2 Shanmuganathan Vasanthapriyan 1,2 Department of Computing and Information Systems, Sabaragamuwa University

More information

Phastlane: A Rapid Transit Optical Routing Network

Phastlane: A Rapid Transit Optical Routing Network Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens

More information

Three basic multiprocessing issues

Three basic multiprocessing issues Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Interconnection Networks. Issues for Networks

Interconnection Networks. Issues for Networks Interconnection Networks Communications Among Processors Chris Nevison, Colgate University Issues for Networks Total Bandwidth amount of data which can be moved from somewhere to somewhere per unit time

More information

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks Xuan-Yi Lin, Yeh-Ching Chung, and Tai-Yi Huang Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan 00, ROC

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Design and Implementation of Multistage Interconnection Networks for SoC Networks

Design and Implementation of Multistage Interconnection Networks for SoC Networks International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 212 Design and Implementation of Multistage Interconnection Networks for SoC Networks Mahsa

More information

DEVELOPMENT OF PARAMETERIZED CELL OF SPIRAL INDUCTOR USING SKILL LANGUAGE

DEVELOPMENT OF PARAMETERIZED CELL OF SPIRAL INDUCTOR USING SKILL LANGUAGE DEVELOPMENT OF PARAMETERIZED CELL OF SPIRAL INDUCTOR USING SKILL LANGUAGE Vladimir Emilov Grozdanov 1, Diana Ivanova Pukneva 1, Marin Hristov Hristov 2 1 Smartcom, 7 th km, Tzarigradsko Chausee Blvd, 1784

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS

IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS M. Thoppian, S. Venkatesan, H. Vu, R. Prakash, N. Mittal Department of Computer Science, The University of Texas at

More information

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Network Topologies John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 14 28 February 2017 Topics for Today Taxonomy Metrics

More information

Parallel and Distributed VHDL Simulation

Parallel and Distributed VHDL Simulation Parallel and Distributed VHDL Simulation Dragos Lungeanu Deptartment of Computer Science University of Iowa C.J. chard Shi Department of Electrical Engineering University of Washington Abstract This paper

More information

Comparative Study of blocking mechanisms for Packet Switched Omega Networks

Comparative Study of blocking mechanisms for Packet Switched Omega Networks Proceedings of the 6th WSEAS Int. Conf. on Electronics, Hardware, Wireless and Optical Communications, Corfu Island, Greece, February 16-19, 2007 18 Comparative Study of blocking mechanisms for Packet

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Efficiency and Quality of Solution of Parallel Simulated Annealing

Efficiency and Quality of Solution of Parallel Simulated Annealing Proceedings of the 11th WSEAS International Conference on SYSTEMS, Agios Nikolaos, Crete Island, Greece, July 23-2, 27 177 Efficiency and Quality of Solution of Parallel Simulated Annealing PLAMENKA BOROVSKA,

More information

Chapter 3 : Topology basics

Chapter 3 : Topology basics 1 Chapter 3 : Topology basics What is the network topology Nomenclature Traffic pattern Performance Packaging cost Case study: the SGI Origin 2000 2 Network topology (1) It corresponds to the static arrangement

More information

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K. Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2 Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.

More information

PARALLEL QUEUING NETWORK SIMULATION WITH LOOKBACK- BASED PROTOCOLS

PARALLEL QUEUING NETWORK SIMULATION WITH LOOKBACK- BASED PROTOCOLS PARALLEL QUEUING NETWORK SIMULATION WITH LOOKBACK- BASED PROTOCOLS Gilbert G. Chen and Boleslaw K. Szymanski Department of Computer Science, Rensselaer Polytechnic Institute 110 Eighth Street, Troy, NY

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information

Parallel Computing Interconnection Networks

Parallel Computing Interconnection Networks Parallel Computing Interconnection Networks Readings: Hager s book (4.5) Pacheco s book (chapter 2.3.3) http://pages.cs.wisc.edu/~tvrdik/5/html/section5.html#aaaaatre e-based topologies Slides credit:

More information

Dr e v prasad Dt

Dr e v prasad Dt Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

Boundary Recognition in Sensor Networks. Ng Ying Tat and Ooi Wei Tsang

Boundary Recognition in Sensor Networks. Ng Ying Tat and Ooi Wei Tsang Boundary Recognition in Sensor Networks Ng Ying Tat and Ooi Wei Tsang School of Computing, National University of Singapore ABSTRACT Boundary recognition for wireless sensor networks has many applications,

More information

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2 CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann

More information

Processor Architecture and Interconnect

Processor Architecture and Interconnect Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing

More information

Initial studies of SCI LAN topologies for local area clustering

Initial studies of SCI LAN topologies for local area clustering Prepared for the First International Workshop on SCI-Based Low-Cost/High-Performance Computing, Santa Clara University Initial studies of SCI LAN topologies for local area clustering Haakon Bryhni * and

More information

EVENT DRIVEN PACKET SIMULATOR

EVENT DRIVEN PACKET SIMULATOR EVENT DRIVEN PACKET SIMULATOR Nikolay Georgiev Chillev,Vassiliy Platonovitch Tchoumatchenko*,Tania Krumova Vassileva* Department of Computer Science,*Department of Electronics, Technical University of

More information

COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS

COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS International Journal of Computer Engineering and Applications, Volume VII, Issue II, Part II, COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS Sanjukta

More information

SIMULATIONS. PACE Lab, Rockwell Collins, IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK

SIMULATIONS. PACE Lab, Rockwell Collins, IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS M. Thoppian, S. Venkatesan, H. Vu, R. Prakash, N. Mittal Department of Computer Science, The University of Texas at

More information

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers. CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Virtual Prototyping and Performance Analysis of RapidIO-based System Architectures for Space-Based Radar

Virtual Prototyping and Performance Analysis of RapidIO-based System Architectures for Space-Based Radar Virtual Prototyping and Performance Analysis of RapidIO-based System Architectures for Space-Based Radar David Bueno, Adam Leko, Chris Conger, Ian Troxel, and Alan D. George HCS Research Laboratory College

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

PARALLEL SIMULATION MADE EASY WITH OMNeT++

PARALLEL SIMULATION MADE EASY WITH OMNeT++ PARALLEL SIMULATION MADE EASY WITH OMNeT++!"$#&%('% ) '(*,+- /.0'% Centre for Telecommunication and Information Engineering, Monash University, Melbourne, Australia Omnest Global Inc., Budapest, Hungary

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

Slim Fly: A Cost Effective Low-Diameter Network Topology

Slim Fly: A Cost Effective Low-Diameter Network Topology TORSTEN HOEFLER, MACIEJ BESTA Slim Fly: A Cost Effective Low-Diameter Network Topology Images belong to their creator! NETWORKS, LIMITS, AND DESIGN SPACE Networks cost 25-30% of a large supercomputer Hard

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

Performance Analysis of Interconnection Networks for Packet Delay using Source Routing

Performance Analysis of Interconnection Networks for Packet Delay using Source Routing pecial Issue of International Journal of Computer Applications (0975 8887) Performance Analysis of Interconnection Networks for Packet Delay using ource Routing Lalit Kishore Arora Ajay Kumar Garg Engg

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors Proceedings of the World Congress on Engineering 2018 ol I A Routing Algorithm for 3 Network-on-Chip in Chip Multi-Processors Rui Ben, Fen Ge, intian Tong, Ning Wu, ing hang, and Fang hou Abstract communication

More information

A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems

A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems Zhenjiang Dong, Jun Wang, George Riley, Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

High Performance Computing. University questions with solution

High Performance Computing. University questions with solution High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

Parallel Implementation of 3D FMA using MPI

Parallel Implementation of 3D FMA using MPI Parallel Implementation of 3D FMA using MPI Eric Jui-Lin Lu y and Daniel I. Okunbor z Computer Science Department University of Missouri - Rolla Rolla, MO 65401 Abstract The simulation of N-body system

More information

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1 EE382C Lecture 1 Bill Dally 3/29/11 EE 382C - S11 - Lecture 1 1 Logistics Handouts Course policy sheet Course schedule Assignments Homework Research Paper Project Midterm EE 382C - S11 - Lecture 1 2 What

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information