PERFORMANCE AND IMPLEMENTATION OF 4x4 SWITCHING NODES IN AN INTERCONNECTION NETWORK FOR PASM

Size: px
Start display at page:

Download "PERFORMANCE AND IMPLEMENTATION OF 4x4 SWITCHING NODES IN AN INTERCONNECTION NETWORK FOR PASM"

Transcription

1 PERFORMANCE AND IMPLEMENTATION OF 4x4 SWITCHING NODES IN AN INTERCONNECTION NETWORK FOR PASM Robert J. McMillen, George B. Adams III, and Howard Jay Siegel School of Electrical Engineering, Purdue University West Lafayette, IN Abstract Design issues for the multistage Generalized Cube network are discussed in this paper. An analysis of the merits of 2-input/2-output interchange boxes versus 4-input/4-output crossbars for interconnection network implementation is made. The cost and performance of each network for the two switching node alternatives are examined. Discussion of the suitability of each approach for VLSI implementation is included. It is shown that in a packet switching environment, 4x4 crossbars outperform, and are less expensive to implement than the four interchange boxes they replace. framework for discussing modifications. In Section III, the performance of two network implementations are compared. Implementation considerations are presented in Section IV. For further details of all this material see [14]. II. DEFINITIONS A partitionable SIMD/MIMD system is a parallel processing system which can be structured as one or more independent SIMD and/or MIMD machines [4] of varying sizes. PASM is a partitionable SIMD/MIMD system for image processing and pattern recognition [16]. TheBMD testbed should have the flexibility to perform as a partitionable SIMD/MIMD machine. The cube network described here can function efficiently in such an environment. The Generalized Cube network (Fig. 1) is a multistage cube-type network topology which was introduced in [173. It has been shown that this topology is equivalent to that used by the omega [7], indirect binary n-cube [113, STARAN [1], and SW-banyan (F=S=2) [63 networks [17, 203. An N input/output Generalized Cube topology has jn = log_n stages, where each stage consists of a set of N lines connected to N/2 interchange boxes. Each interchange box is a 2-input/2-output device. The labels of the input/output (I/O) lines entering the upper and lower inputs of an interchange box are used as the labels for the upper and lower outputs, respectively. The labels are the integers from 0 to N-1. Each interchange box can be set to one of four states as shown in Fig. 1. The connections in this network are based on the cube interconnection functions [133. Stage i of the generalized cube topology pairs I/O lines that differ only in the i-th bit position. The name cube network will be used to refer to the network consisting of the Generalized Cube topology and four-state interchange boxes. Each interchange box will be controlled independently through the use of routing tags [7, 153. I. INTRODUCTION The choice of interconnection network is a central issue in the design of large-scale, multimicroprocessor-based distributed and parallel systems. The Ballistic Missile Defense (BMD) Agency is designing a test bed for evaluating such systems as they may apply to BMD tasks [83. PASM is a multimicroprocessor system being designed at Purdue University for a variety of image processing and pattern recognition problems [163. In both cases a highly flexible network is needed for communication among processors and memories. The Generalized Cube network has a cube-type topology and is constructed from 2-input/2-output crossbars or interchange boxes [17]. A more general form of interchange box is an a-input/aoutput (a x a) switching A relative of the Generalized Cube network can be constructed from a a switching nodes using cube-type connections between stages. Many papers in the literature discuss using larger than 2x2 interchange boxes for implementing multistage cube-type networks- [2, 7, 10, 11, 12, 183. In the following, design options for 4x4 switching nodes are considered. The performances of two designs are evaluated and their implementation in discrete logic (e.g., TTL) and VLSI is considered. It will be shown that a 4x4 crossbar performs better and costs less than four 2x2 crossbars in a packet switching environment. The logical structure of the Generalized Cube network is defined in Section II to provide a This work was supported by the Ballistic Missile Defense Agency under grant number DASG60-80-C-0022 and the Air Force Office of Scientific Research, Air Force Systems Command, USAF, under AFOSR The United States Government is authorized to reproduce and distribute reprints for Government purposes non-withstanding any copyright notation here on. The views, opinions, and/or findings contained in this report are those of the author(s) and should not be construed as an official Department of the Army position, policy, or decision, unless so designated by other official documentation. Figure 1(a): (b): Generalized Cube topology for N=8. Four states of an interchange box /81/0000/0229$ IEEE

2 It is assumed that processors and memories are paired to form processing elements (PE/j;). The network is configured such that PE i is connected to input i and output i, CKi<N. The packet switching mode, in which packets move from stage to stage in the network as paths between stages become available, is assumed. They do not require that their entire path be established prior to entering the network. A packet consists of a routing tag and a number of data items. Packet switching in multistage networks has been discussed in [3,19]. The primary goal here is to investigate the cost-effectiveness of constructing multistage cube networks from 4x4 crossbars versus 2x2 crossbars (interchange boxes). Since a single 2x2 interchange box is not functionally comparable to a 4x4 crossbar (i.e., it can only handle two items at a time instead of four), the 4x4 crossbar is compared with a 4x4 composition of four 2x2 interchange boxes. This configuration is called a composite node and is shown in Fig. 2. A network constructed from properly connected (to be specified later) composite nodes is identical to a cube network constructed from 2x2 interchange boxes. The external connections of the crossbar (Fig. 3) are identical to those of the composite node, so it can be directly substituted for a 4x4 composite Many options for the implementation of 2x2 interchange boxes were discussed in [9]. To avoid repetition, one of the configurations discussed in that paper will be assumed here. It is assumed that packet switching is implemented and that an entire packet is transferred between adjacent stages during one network clock cycle. Furthermore, the size of each input queue in a switching node is assumed to be an integral multiple of the packet size. The packet size is thus not restricted to be any particular number of words. III. PERFORMANCE ANALYSIS The 4x4 crossbar node and composite node will be compared in their performance at both a local and global level. On a local level blocking within a node is examined. On the global level, the permuting ability of two networks constructed from the respective 4x4 switching nodes is compared. Consider the local level. Let level 1 of a composite node be the two interchange boxes connected to the inputs of the node and level 2 be the two interchange boxes connected to the outputs. The composite node can perform 16 permutation connections (each box either straight or exchange) and the crossbar node can perform all 4! possible permutation connections. For those permutations where there is no conflict in either node, the messages traverse the composite node in twice the time required by those in the crossbar node due to the two levels of interchange boxes. When conflicts occur in the crossbar node, the delay due to waiting diminishes the speedup achieved. Consider situations where there are conflicts in a switch For this analysis it is assumed that the destination of any message is a uniformly distributed random variable. Also, it is assumed that each message has only one destination (i.e., no broadcasting). Both the composite node and the 4x4 crossbar node have four inputs and four outputs so there are 4 =256 distinct patterns in which messages may need to be routed through the boxes. Since the destinations are assumed to be random and uniformly distributed, the distinct data patterns of routing are all equally likely. Assuming four simultaneous inputs is somewhat of a worst case, since in MIMD mode this would be con- Figure 2: A 4x4 composite node constructed from four 2x2 interchange boxes. Figure 3: A 4x4 crossbar 230

3 sidered heavy loading and in SIMD mode destinations are not random but structured and chosen to avoid conflicts. The node is assumed initially empty. Consider the 4x4 crossbar Let r be the maximum number of messages desiring any given output of the 4x4 crossbar The total time required for all four messages to pass through the node is r. PCr=D = 24/256, P(r=2) = 180/256, P(r=3) = 48/256, and P(r=4) = 4/256. The expected time to pass all four messages through the crossbar node is given by: 4 i P(r=i) = network clock cycles. i=1 That is, given that four messages arrive at an empty crossbar node simultaneously, on the average it will take network clock cycles for the node to empty. Now consider the composite The following notation will be used in the ensuing equations, where i=1 or 2: P(iU) = P(no conflict level i, upper box) = 1/2; P(iL) = P(no conflict level i, lower box) = 1/2; P(iX) = 1/2, where X = U or L; and P(i) = P(no conflict in level i) = 1/4. Now consider the probabilities of different amounts of time, t, to pass four input messages through the composite The minimum time possible is 2 network clock cycles because there are two levels. P(t=2) = P(1U) P(1L) P(2U) P(2L) = 1/16. For a total time of 3 network clock cycles there are 5 cases to consider. First assume no conflicts occur in level 1. P(t=3, case 1) = P(1) d-p(2)) = 3/16. Next, assume exactly one level 1 interchange box has a conflict. P(t=3, case 2) = C(1-P(1U)) P(1L)+P(1U) (1-P(1L))3 P(2X) = 1/4. For case 3, there is one conflict at each level, but the maximum delay is 3 cycles. P(t=3, case 3) = C(1-P(1U)) P(1L)+P(1U) (1-P(1D): (1-P(2X)) (1/2) P(2X) = 1/16. The first factor is the probability that exactly one box at level 1 has a conflict. The next factor is the probability that the first message from the level 1 box which had a conflict, call this message M, also has a conflict at level 2. The (1/2) is the probability that M will be chosen to pass through the Level 2 box first. The last factor is the probability that the two delayed messages do not conflict. Case 4 assumes that there is a conflict in both level 1 boxes and that both level 2 boxes receive messages (this happens half the time there are two conflicts in level 1). P(t=3, case 4) = (1/2) d-p(1u)) d-p(1d) = 1/8. Finally, assume conflict in both level 1 boxes but only one level 2 box receives messages and there is no conflict for either pair that passes through: P(t=3), case 5) = (1/2) (1-P(1U)) (1-P(1L)) P(2X) P(2X) = 1/32. The probability that all messages pass through the composite node in 3 network clock cycles is For a time of 4, there are four cases to consider. The first case is where there is one conflict at each level. There are two ways to obtain a time of 4 from this situation: (1) the delayed message enters a non-empty queue in level 2 and (2) the delayed message enters an empty queue but conflicts with the other remaining message: P(t=4, case 1) = C(1-P(1U)> P(1L)+P(1U> C1-P(1L))3 C(1/2) (1-P(2X))+(1/2) (1-P(2X)) (1-P(2X)):=3/16. Now assume conflict in both level 1 boxes and that only one level 2 box receives messages (this happens half the time there are two conflicts in level 1). Given this occurs, there are three ways (cases 2, 3, and 4) a time of 4 occurs. In case 2, the first two messages reaching the box in level 2 conflict, but there are no subsequent conflicts: P(t=4, case 2) = (1/2) <1-P(1U)) <1-P(1L>) (1-P(2X)) P(2X) = 1/32. In case 3, the first pair of messages do not conflict but the second pair do: P(t=4, case 3) = (1/2) (1-P(1U)> (1-P(1L)> P(2X) (1-P(2X)) = 1/32. In case 4, the first and second pair of messages conflict. When the second pair conflicts, one queue will contain two messages. For a time of 4 the queue with two items must be selected to resolve the second conflict and a third conflict must not occur. P(t=4,case 4) = (1/2) (1-P(1U)) (1-P(1D) d-p(2x)) (1-P(2X)) (1/2) P(2X) = 1/128. The probability of a time of 4 is: P(t=4) = 3/16 + 1/32 + 1/32 + 1/128 = 33/128. The time of 5 happens when either of the two conditions of case 4 for a time of 4 are not met. P(t=5) = (1/2) (1-P(1U)) (1-P(1L)) (1-P(2X>) C(1/2)(1-P(2X))+(1/2)(1-P(2X))(1-P(2X)): = 3/128. The expected time for all four messages to pass through the composite node is: This time is 53% longer than the network clock cycles expected with the crossbar Consider the global level. To construct a network from m/2 stages of N/4 4x4 switching nodes, assume all connection lines in the network are labeled in base 4 and that the stages are numbered (m/2)-1,,1,0 (from input to output). At stage i, the four input lines to a node are those that differ only in the i-th position of their base 4 representation. The line with a 0 in the i-th position connects to the top input, 2 to the next input, 1 to the next input, and 3 to the bottom input. The output lines of the 4x4 switching nodes have the same labels as the input lines, but in increasing order, i.e., the top output line label has a 0 in the i-th position, next 1, next 2, and the bottom 3. When composite nodes are used, making connections in the above manner creates a cube network. When crossbars nodes are used, a network is created whose capabilities are a super- A composite node network consists of Nra/2 in- Nm/? terchange boxes, allowing 2 permutations. Assuming m is even, a 4x4 crossbar node network con- Nm/8 sists of Nm/8 nodes, permitting (4!) permutations. If m is odd and one stage is constructed by 4x4 crossbar nodes limited to act as a 2x2 231

4 232 IV. IMPLEMENTATION To control the network, the destination tags defined in [7] are used. Let the destination ad-, dress D be represented in binary as d ^ d.d. A switching node in stage i examines bits d-,. +, and dj.j. For the composite node, the first level interchange boxes examine only bit d~... and the second level interchange boxes examine only bit d_.. If the bit examined is 0, the upper output link of the interchange box is selected and if the bit is 1, the lower link is selected. For the crossbar node, both bits are examined simultaneously. Together they are considered a single base four digit which corresponds to one of the outputs labeled 0 through 3. To add a broadcast capability, an m-bit broadcast mask is appended [15]. Let the mask B be represented in binary as b. b. b. A switching node in stage i now examines b_. +1, b_, d- +«and dp - For the composite node, first level interchange boxes examine bits with index 2i+1 and second level boxes examine bits with index 2i. If the broadcast mask bit is 0, the destination tag bit is interpreted as before. If the mask bit is 1, the destination bit is ignored and both output links of the interchange box are selected. For the crossbar node the four bits are all examined simultaneously. They are interpreted so as to es- tablish the same connections as those that would be obtained in the composite Five kinds of broadcasts are defined for either type of 4x4 switching Hardware Without Broadcast Capability For simplicity, designs for the composite node and the crossbar node initially will be developed assuming no broadcast capability. Then, those portions of the designs affected by inclusion of a broadcast capability will be modified and compared. In the following analysis, hardware complexity is measured in terms of logic gate count and chip count. The gate counts are used as a first approximation to compare VLSI implementations. Designs using this technology must also consider wiring complexity [5]. The chip counts are used to compare discrete logic (e.g. TTL) implementations, assuming standard gate-per-chip packaging. Examining Figs. 2 and 3, the first difference noted is that the crossbar node requires half as many queues as the composite Depending on the actual queue size, a considerable savings in logic may be realized in the implementation of the crossbar To compare multiplexer requirements, typical implementations of 2-to-1 and 4-to-1 multiplexers were examined [14]. Eight 2-to-1 multiplexers require 20% more gates (regardless of path width) than four 4-to-1 multiplexers. The chip counts are equal. Since the number of external connections for data and control lines is the same for both designs, any buffering/signal conditioning logic will be comparable. In a VLSI design, this implies identical pin counts. Thus far the crossbar node appears to be the better choice. It is however, decidedly more complicated to arbitrate the requests of four packets simultaneously (as opposed to two) while assuring each packet equal access to each output link on the average. To determine whether one 4x4 control unit is actually more complex than four 2x2 control units, the functional components of the control units are considered. The control unit of a 2x2 interchange box contains two sets of queue control logic, input request arbitration (IRA) logic, output request arbitration (ORA) logic, and timing. The control unit for a 4x4 crossbar node contains four sets of queue control logic. The remaining components are the functional equivalents of those for the 2x2 interchange box. The most obvious difference between the two designs is that four 2x2 control units contain twice as many sets of queue control logic as one 4x4 control unit. One set of queue control logic contains two registers which store pointers, one to the front and one to the back of its associated queue. If the queue is Q words long, log-,q bits are required for each register. The IRA logic is quite simple. If a request is made for the i-th input, (i=0,1 for the 2x2; i=0,1,2,3 for the 4x4), it will be granted if the i-th queue is not full. Once again, four 2x2 control units require twice as much IRA logic as one 4x4 control unit. The timing logic is identical in both cases. Three clock phases are generated. A request/grant/transfer protocol is implemented (see [9]). None of the logic discussed thus far is affected by the inclusion of a broadcast capability. Thus, its analysis is equally applicable to the next subsection, which includes broadcast capabilities. The most important and by far the most complex component of the control unit is the ORA logic. It is responsible for examining the routing tag bits and generating signals to set the multiplexers and make requests. It must also examine the grant signals and generate control signals for the "increment front pointer" input of each set of queue control logic. The complexity of this logic arises from arbitrating conflicting requests for access to the output ports. To compare the ORA logic, equations are derived for all its output signals as a function of the tag bits and grant signals [14]. The total (NAND) gate count for 4 sets 2x2 of control unit logic is 104 gates. This corresponds to 24 chips. The control unit for the 4x4 crossbar node requires 124 gates. There is a 19% increase in the number of gates required by the crossbar In a discrete logic design, the chip count is 32. This is a 33% increase over the 24 chips required in the composite The excess in ORA logic can be compensated for, since a 4x4 crossbar node requires half the queue control and IRA logic of a 4x4 composite From the equations derived, 20 extra gates or eight extra chips are required for the 4x4 crossbar ORA logic. Assuming one of the eight sets of queue control and IRA logic in a composite node will require more than 5 gates or 2 chips, the 4x4 crossbar node is actually less expensive to build. Despite the higher wiring complexity of

5 the 4x4 crossbar node, the total design effort is comparable to that required by the 4x4 composite Hardware With Broadcast Capability Adding a broadcast capability requires the ORA logic to examine the broadcast mask bits in addition to the routing tag bits. The revised equations for the 2x2 control unit require 33 gates, which multiplied by 4 is 156. This is equivalent to 48 chips. A broadcasting capability costs 52 gates or 24 chips beyond the requirements for a 4x4 composite node without it. More details can be found in [14]. The circuitry needed to add the same broadcast capability to 4x4 crossbar nodes as was added to the composite nodes requires 233 gates, a 49% increase over the 156 required for the composite The chip count is 74, a 54% increase over 48. In this case it is likely that one of the eight sets of queue control and IRA logic will require more than 20 gates or 7 chips. If not, the savings in queue gates will compensate for the difference. Again the crossbar node is less expensive than a composite node where both have the same broadcast capability. V. CONCLUSIONS At a local level, the crossbar node is always faster at passing four messages that arrive simultaneously than the composite If the connection requests do not conflict in the composite, the crossbar is twice as fast. When the connection requests of the messages form a permutation which the composite node cannot pass without conflict, it takes 3 times longer for all messages to exit the composite Assuming each message chooses each output with equal probability, on the average it takes approximately 53% more time for all messages to pass through the composite node than through the crossbar The ORA logic is the only logic requiring more hardware in a crossbar node than in a composite Otherwise, a crossbar node requires half as much queue control and IRA logic, and half as many queues. The multiplexer logic is less than or comparable to that needed by the composite The net result is that when packet switching is implemented, the 4x4 crossbar node requires less hardware and significantly out-performs a composite If circuit switching is implemented, no queues or their associated control logic are required. In this case, the crossbar node does contain more hardware. However, it offers a significant improvement in connectivity/permuting ability. If the switching nodes are implemented as VLSI chips, since both nodes require the same number of pins, the gate/pin ratio is improved with a crossbar implementation. Only in the case where circuit switching is implemented in discrete logic is further consideration required. Without a broadcast capability (which is less important in a circuit switching environment), there is only a small difference in the chip count. In summary, the implementation of cube-type networks using 2x2 and 4x4 crossbars were compared. It was shown that for packet switching the 4x4 crossbar is a more cost-effective approach. REFERENCES 1 K. Batcher, "The flip network in STARAN," 1976 Int. Conf. Parallel Processing, pp , Aug L. Ciminiera, A. Serra, "Modular interconnection networks with asynchronous control," 14th Hawaii Int. Conf. System Sciences, pp , Jan D. Dias, J. Jump, "Packet communication in multistage shuffle-exchange networks," 1980 Int. Conf. Parallel Processing, pp , Aug M. Flynn, "Very high-speed computing systems," Proc. IEEE, Vol. 54, pp , Dec M. Franklin, "VLSI performance comparison of banyan and crossbar communications networks," Workshop on Interconnection Networks for Parallel and Distributed Processing, pp Apr G. Goke, G. J. Lipovski, "Banyan networks for partitioning multiprocessor systems," 1st Symp. Comp. Arch., pp , Dec D. Lawrie, "Access and alignment of data in an array processor," IEEE Trans. Comp., Vol. C-24, pp , Dec W. McDonald, J. Williams, "The advanced data processing test bed," Compsac, pp , Mar R. J. McMillen, H. J. Siegel, "The hybrid cube network," Distributed Data Acquisition, Computing and Control Symp., pp , Dec, J. Patel, "Processor-memory interconnections for multiprocessors," 6th Symp. Comp. Arch., pp , Apr M. Pease, "The indirect binary n-cube microprocessor array," IEEE Trans. Comp., Vol. C-26, pp , May U. Premkumar, et al., "Design and implementation of the banyan interconnection network in TRAC," NCC, pp , June H. J. Siegel, "A model of SIMD machines and a comparison of various interconnection networks," IEEE Trans. Comp., Vol. C-28, pp , Dec H. J. Siegel, et al., Parallel/Distributed Multimicroprocessor Systems for Ballistic Missile Defense, Purdue, EE School, TR-EE 81-12, June H. J. Siegel, R. J. McMillen, "The cube network as a distributed processing test bed switch," 2nd Int. Conf. Distributed Computing Systems, pp , Apr H. J. Siegel, et al., "PASM: A partitionable SIMD/MIMD system for image processing and pattern recognition," IEEE Trans. Comp., to appear. 17 H. J. Siegel, S. D. Smith, "Study of multistage SIMD interconnection networks," 5th Symp. Comp. Arch., pp , Apr S. D. Smith, "LSI design considerations for multistage interconnection networks for parallel processing systems," 14th Hawaii Int. Conf. System Sciences, pp , Jan A. Tripathi, G. J. Lipovski, "Packet switching banyan networks," 6th Symp. Comp. Arch., pp , Apr C. Wu, T. Feng, "On a class of multistage interconnection networks," IEEE Trans. Comp., Vol. C-29, pp , Aug

FAULT LOCATION IN DISTRIBUTED CONTROL INTERCONNECTION NETWORKS

FAULT LOCATION IN DISTRIBUTED CONTROL INTERCONNECTION NETWORKS FAULT LOCATION IN DISTRIBUTED CONTROL INTERCONNECTION NETWORKS Nathaniel J. Davis IV William Tsun-Yuk Hsu Howard Jay Siegel PASM Parallel Processing Laboratory School of Electrical Engineering Purdue University

More information

/83/0000/0407$ IEEE 407

/83/0000/0407$ IEEE 407 PRELOADING SCHEMES FOR THE PASM PARALLEL MEMORY SYSTEM David Lee Tuomenoksa! Howard Jay Siegel Purdue University School of Electrical Engineering West Lafayette, Indiana 47907 Abstract Parallel processing

More information

PARALLEL MEMORY SYSTEM FOR A PARTITIONABLE SIMD/MIMD MACHINE

PARALLEL MEMORY SYSTEM FOR A PARTITIONABLE SIMD/MIMD MACHINE PARALLEL MEMORY SYSTEM FOR A PARTITIONABLE SIMD/MIMD MACHINE '» Howard Jay Siegel Frederick Kemmerer Mark Washburn Purdue University School of Electrical Engineering West Lafayette, IN 47907 Abstract PASM

More information

Block diagram overview of PASM.

Block diagram overview of PASM. ANALYSIS OF THE PASM CONTROL SYSTEM MEMORY HIERARCHY David Lee Tuomenoksa Howard Jay Siegel Purdue University School of Electrical Engineering West Lafayette, IN 47907 Abstract - Many proposed large-scale

More information

PARALLEL IMAGE CORRELATION. Leah J. Siegel., Howard Jay Siegel, and Arthur E. Feather

PARALLEL IMAGE CORRELATION. Leah J. Siegel., Howard Jay Siegel, and Arthur E. Feather PARALLEL IMAGE CORRELATION Leah J. Siegel., Howard Jay Siegel, and Arthur E. Feather Purdue University School of Electrical Engineering West Lafayette, IN 47907 Abstract Image correlation is representative

More information

Dr e v prasad Dt

Dr e v prasad Dt Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction

More information

Performance Study of Packet Switching Multistage Interconnection Networks

Performance Study of Packet Switching Multistage Interconnection Networks ETRI Journal, volume 16, number 3, October 1994 27 Performance Study of Packet Switching Multistage Interconnection Networks Jungsun Kim CONTENTS I. INTRODUCTION II. THE MODEL AND THE ENVIRONMENT III.

More information

Characteristics of Mult l ip i ro r ce c ssors r

Characteristics of Mult l ip i ro r ce c ssors r Characteristics of Multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input output equipment. The term processor in multiprocessor can mean either a central

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks

More information

Multiprocessors Interconnection Networks

Multiprocessors Interconnection Networks Babylon University College of Information Technology Software Department Multiprocessors Interconnection Networks By Interconnection Networks Taxonomy An interconnection network could be either static

More information

The Complexity of FFT and Related Butterfly Algorithms on Meshes and Hypermeshes

The Complexity of FFT and Related Butterfly Algorithms on Meshes and Hypermeshes The Complexity of FFT and Related Butterfly Algorithms on Meshes and Hypermeshes T.H. Szymanski McGill University, Canada Abstract Parallel FFT data-flow graphs based on a Butterfly graph followed by a

More information

Design and implementation of the banyan interconnection network in TRAC*

Design and implementation of the banyan interconnection network in TRAC* Design and implementation of the banyan interconnection network in TRAC* by U. V. PREMKUMAR, R. KAPUR, M. MALEK, G. J. LIPOVSKI, and P. HORNE University of Texas Austin, Texas 1.0 INTRODUCTION Over the

More information

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES S. SRINIVAS KUMAR *, R.BASAVARAJU ** * PG Scholar, Electronics and Communication Engineering, CRIT

More information

EE~hEEEEEENi. HD-AI A VERSATILE PARALLEL IMAGE PROCESSOR SYSTEM() PURDUE I/i

EE~hEEEEEENi. HD-AI A VERSATILE PARALLEL IMAGE PROCESSOR SYSTEM() PURDUE I/i L I HD-AI36 292 A VERSATILE PARALLEL IMAGE PROCESSOR SYSTEM() PURDUE I/i UNIV LAFAYETTE IN DEPT OF ELECTRICAL ENGINEERING NH J SIEGEL OCT 83 AFOSR-TR-83-i23i RFOSR-78-3581 UNCLASSIFIED F/G 9/2 N EE~hEEEEEENi

More information

Performance and Reliability Analysis of New Fault-Tolerant Advance Omega Network

Performance and Reliability Analysis of New Fault-Tolerant Advance Omega Network Performance and Reliability Analysis of New Fault-Tolerant Advance Omega Network RITA MAHAJAN 1, Dr.RENU VIG 2, 1 Department Of Electronics And Electrical Communication, Punjab Engineering College, Chandigarh,

More information

Path Delay Fault Testing of a Class of Circuit-Switched Multistage Interconnection Networks

Path Delay Fault Testing of a Class of Circuit-Switched Multistage Interconnection Networks Path Delay Fault Testing of a Class of Circuit-Switched Multistage Interconnection Networks M. Bellos 1, D. Nikolos 1,2 & H. T. Vergos 1,2 1 Dept. of Computer Engineering and Informatics, University of

More information

IV. PACKET SWITCH ARCHITECTURES

IV. PACKET SWITCH ARCHITECTURES IV. PACKET SWITCH ARCHITECTURES (a) General Concept - as packet arrives at switch, destination (and possibly source) field in packet header is used as index into routing tables specifying next switch in

More information

ATM SWITCH: As an Application of VLSI in Telecommunication System

ATM SWITCH: As an Application of VLSI in Telecommunication System Volume-6, Issue-6, November-December 2016 International Journal of Engineering and Management Research Page Number: 87-94 ATM SWITCH: As an Application of VLSI in Telecommunication System Shubh Prakash

More information

Virtual Circuit Blocking Probabilities in an ATM Banyan Network with b b Switching Elements

Virtual Circuit Blocking Probabilities in an ATM Banyan Network with b b Switching Elements Proceedings of the Applied Telecommunication Symposium (part of Advanced Simulation Technologies Conference) Seattle, Washington, USA, April 22 26, 21 Virtual Circuit Blocking Probabilities in an ATM Banyan

More information

Processing Systems. Fig. 1. A model for a reconfigurable parallel processing system.

Processing Systems. Fig. 1. A model for a reconfigurable parallel processing system. IEEE TRANSACTIONS ON COMPUTERS, VOL. C-33, NO. 10, OCTOBER 1984 895 Task Preloading Schemes for Reconfigurable Parallel Processing Systems DAVID LEE TUOMENOKSA, MEMBER, IEEE, AND HOWARD JAY SIEGEL, SENIOR

More information

BROADBAND AND HIGH SPEED NETWORKS

BROADBAND AND HIGH SPEED NETWORKS BROADBAND AND HIGH SPEED NETWORKS ATM SWITCHING ATM is a connection-oriented transport concept An end-to-end connection (virtual channel) established prior to transfer of cells Signaling used for connection

More information

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including Router Architectures By the end of this lecture, you should be able to. Explain the different generations of router architectures Describe the route lookup process Explain the operation of PATRICIA algorithm

More information

On the Permutation Capability of Multistage Interconnection Networks

On the Permutation Capability of Multistage Interconnection Networks 81 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-36, NO. 7, JULY 1987 On the Permutation Capability of Multistage Interconnection Networks TED H. SZYMANSKI AND V. CARL HAMACHER, SENIOR MEMBER, IEEE Abstract-We

More information

INTERCONNECT TESTING WITH BOUNDARY SCAN

INTERCONNECT TESTING WITH BOUNDARY SCAN INTERCONNECT TESTING WITH BOUNDARY SCAN Paul Wagner Honeywell, Inc. Solid State Electronics Division 12001 State Highway 55 Plymouth, Minnesota 55441 Abstract Boundary scan is a structured design technique

More information

Homework Assignment #1: Topology Kelly Shaw

Homework Assignment #1: Topology Kelly Shaw EE482 Advanced Computer Organization Spring 2001 Professor W. J. Dally Homework Assignment #1: Topology Kelly Shaw As we have not discussed routing or flow control yet, throughout this problem set assume

More information

Scalable crossbar network: a non-blocking interconnection network for large-scale systems

Scalable crossbar network: a non-blocking interconnection network for large-scale systems J Supercomput DOI 10.1007/s11227-014-1319-2 Scalable crossbar network: a non-blocking interconnection network for large-scale systems Fathollah Bistouni Mohsen Jahanshahi Springer Science+Business Media

More information

Switch Fabrics. Switching Technology S Recursive factoring of a strict-sense non-blocking network

Switch Fabrics. Switching Technology S Recursive factoring of a strict-sense non-blocking network Switch Fabrics Switching Technology S38.65 http://www.netlab.hut.fi/opetus/s3865 5 - Recursive factoring of a strict-sense non-blocking network A strict-sense non-blocking network can be constructed recursively,

More information

Comparative Study of blocking mechanisms for Packet Switched Omega Networks

Comparative Study of blocking mechanisms for Packet Switched Omega Networks Proceedings of the 6th WSEAS Int. Conf. on Electronics, Hardware, Wireless and Optical Communications, Corfu Island, Greece, February 16-19, 2007 18 Comparative Study of blocking mechanisms for Packet

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

CONGESTION CONTROL BY USING A BUFFERED OMEGA NETWORK

CONGESTION CONTROL BY USING A BUFFERED OMEGA NETWORK IADIS International Conference on Applied Computing CONGESTION CONTROL BY USING A BUFFERED OMEGA NETWORK Ahmad.H. ALqerem Dept. of Comp. Science ZPU Zarka Private University Zarka Jordan ABSTRACT Omega

More information

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies VLSI IMPLEMENTATION OF HIGH PERFORMANCE DISTRIBUTED ARITHMETIC (DA) BASED ADAPTIVE FILTER WITH FAST CONVERGENCE FACTOR G. PARTHIBAN 1, P.SATHIYA 2 PG Student, VLSI Design, Department of ECE, Surya Group

More information

ECE 697J Advanced Topics in Computer Networks

ECE 697J Advanced Topics in Computer Networks ECE 697J Advanced Topics in Computer Networks Switching Fabrics 10/02/03 Tilman Wolf 1 Router Data Path Last class: Single CPU is not fast enough for processing packets Multiple advanced processors in

More information

/86/0000/0108 $ IEEE \m PERFORMANCE STUDIES OF MULTIPLE-PACKET MULTISTAGE CUBE NETWORKS AND COMPARISON TO CIRCUIT SWITCHING

/86/0000/0108 $ IEEE \m PERFORMANCE STUDIES OF MULTIPLE-PACKET MULTISTAGE CUBE NETWORKS AND COMPARISON TO CIRCUIT SWITCHING PERFORMANCE STUDIES OF MULTIPLEPACKET MULTISTAGE CUBE NETWORKS AND COMPARISON TO CIRCUIT SWITCHING Nathaniel J. Davis FV Department of Electrical and Computer Engineering Air Force Institute of Technology

More information

Available online at ScienceDirect. Procedia Computer Science 70 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 70 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 0 (0 ) th International Conference on Eco-friendly Computing and Communication Systems, ICECCS 0 Terminal Reliability Assessment

More information

Multi-Processor / Parallel Processing

Multi-Processor / Parallel Processing Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering.

[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering. Switch Design [ 10.3.2] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering. Here is a basic diagram of a switch. Receiver

More information

Chapter 8 : Multiprocessors

Chapter 8 : Multiprocessors Chapter 8 Multiprocessors 8.1 Characteristics of multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input-output equipment. The term processor in multiprocessor

More information

Switch Fabrics. Switching Technology S P. Raatikainen Switching Technology / 2006.

Switch Fabrics. Switching Technology S P. Raatikainen Switching Technology / 2006. Switch Fabrics Switching Technology S38.3165 http://www.netlab.hut.fi/opetus/s383165 L4-1 Switch fabrics Basic concepts Time and space switching Two stage switches Three stage switches Cost criteria Multi-stage

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

Manipulator Network in an MIMD System

Manipulator Network in an MIMD System 122 EEE TRANSACTONS ON COMPTERS, VOL. C-31, NO. 12, DECEMBER 1982 Routing Schemes for the Augmented Data Manipulator Network in an MMD System ROBERT J. McMLLEN, STDENT MEMBER, EEE, AND HOWARD JAY SEGEL,

More information

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Bradley F. Dutton, Graduate Student Member, IEEE, and Charles E. Stroud, Fellow, IEEE Dept. of Electrical and Computer Engineering

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

SHARED MEMORY VS DISTRIBUTED MEMORY

SHARED MEMORY VS DISTRIBUTED MEMORY OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Facultad de Informatica, Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia,

More information

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD Pipeline and Vector Processing 1. Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Equivalent Permutation Capabilities Between Time-Division Optical Omega Networks and Non-Optical Extra-Stage Omega Networks

Equivalent Permutation Capabilities Between Time-Division Optical Omega Networks and Non-Optical Extra-Stage Omega Networks 518 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 9, NO. 4, AUGUST 2001 Equivalent Permutation Capabilities Between Time-Division Optical Omega Networks and Non-Optical Extra-Stage Omega Networks Xiaojun Shen,

More information

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued) Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn

More information

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in

More information

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 09 MULTIPLEXERS

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 09 MULTIPLEXERS Name: Instructor: Engr. Date Performed: Marks Obtained: /10 Group Members (ID):. Checked By: Date: Experiment # 09 MULTIPLEXERS OBJECTIVES: To experimentally verify the proper operation of a multiplexer.

More information

Reliability Analysis of Modified Irregular Augmented Shuffle Exchange Network (MIASEN)

Reliability Analysis of Modified Irregular Augmented Shuffle Exchange Network (MIASEN) www.ijcsi.org https://doi.org/10.20943/01201703.5964 59 Reliability Analysis of Modified Irregular Augmented Shuffle Exchange Network (MIASEN) Shobha Arya 1 and Nipur Singh 2 1,2 Department of Computer

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Systolic Super Summation with Reduced Hardware

Systolic Super Summation with Reduced Hardware Systolic Super Summation with Reduced Hardware Willard L. Miranker Mathematical Sciences Department IBM T.J. Watson Research Center Route 134 & Kitichwan Road Yorktown Heights, NY 10598 Abstract A principal

More information

Efficient Algorithms for Checking the Equivalence of Multistage Interconnection Networks

Efficient Algorithms for Checking the Equivalence of Multistage Interconnection Networks Efficient Algorithms for Checking the Equivalence of Multistage Interconnection Networks Tiziana Calamoneri Dip. di Informatica Annalisa Massini Università di Roma La Sapienza, Italy. via Salaria 113-00198

More information

DAVID M. KOPPELMAN 2735 East 65th Street Brooklyn, NY 11234

DAVID M. KOPPELMAN 2735 East 65th Street Brooklyn, NY 11234 A SELF ROUTING PERMUTATION NETWORK DAVID M. KOPPELMAN 735 East 65th Street Brooklyn, NY 34 A. YAVUZ ORUÇ Electrical Engineering Department University of Maryland and Institute for Advanced Computer Studies

More information

Architectures of Flynn s taxonomy -- A Comparison of Methods

Architectures of Flynn s taxonomy -- A Comparison of Methods Architectures of Flynn s taxonomy -- A Comparison of Methods Neha K. Shinde Student, Department of Electronic Engineering, J D College of Engineering and Management, RTM Nagpur University, Maharashtra,

More information

Introduction to ATM Technology

Introduction to ATM Technology Introduction to ATM Technology ATM Switch Design Switching network (N x N) Switching network (N x N) SP CP SP CP Presentation Outline Generic Switch Architecture Specific examples Shared Buffer Switch

More information

MODEL FOR DELAY FAULTS BASED UPON PATHS

MODEL FOR DELAY FAULTS BASED UPON PATHS MODEL FOR DELAY FAULTS BASED UPON PATHS Gordon L. Smith International Business Machines Corporation Dept. F60, Bldg. 706-2, P. 0. Box 39 Poughkeepsie, NY 12602 (914) 435-7988 Abstract Delay testing of

More information

Literature Survey of nonblocking network topologies

Literature Survey of nonblocking network topologies Literature Survey of nonblocking network topologies S.UMARANI 1, S.PAVAI MADHESWARI 2, N.NAGARAJAN 3 Department of Computer Applications 1 Department of Computer Science and Engineering 2,3 Sakthi Mariamman

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration MULTIPROCESSORS Characteristics of Multiprocessors Interconnection Structures Interprocessor Arbitration Interprocessor Communication and Synchronization Cache Coherence 2 Characteristics of Multiprocessors

More information

On a Fast Interconnections

On a Fast Interconnections IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 75 On a Fast Interconnections Ravi Rastogi and Nitin* Department of Computer Science & Engineering and Information

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)

More information

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) Objective Part A: To become acquainted with Spectre (or HSpice) by simulating an inverter,

More information

A 32-bit Processor: Sequencing and Output Logic

A 32-bit Processor: Sequencing and Output Logic Lecture 18 A 32-bit Processor: Sequencing and Output Logic Hardware Lecture 18 Slide 1 Last lecture we defined the data paths: Hardware Lecture 18 Slide 2 and we specified an instruction set: Instruction

More information

Leso Martin, Musil Tomáš

Leso Martin, Musil Tomáš SAFETY CORE APPROACH FOR THE SYSTEM WITH HIGH DEMANDS FOR A SAFETY AND RELIABILITY DESIGN IN A PARTIALLY DYNAMICALLY RECON- FIGURABLE FIELD-PROGRAMMABLE GATE ARRAY (FPGA) Leso Martin, Musil Tomáš Abstract:

More information

A MULTIPROCESSOR SYSTEM. Mariam A. Salih

A MULTIPROCESSOR SYSTEM. Mariam A. Salih A MULTIPROCESSOR SYSTEM Mariam A. Salih Multiprocessors classification. interconnection networks (INs) Mode of Operation Control Strategy switching techniques Topology BUS-BASED DYNAMIC INTERCONNECTION

More information

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011 CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

The Serial Commutator FFT

The Serial Commutator FFT The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this

More information

Switching. An Engineering Approach to Computer Networking

Switching. An Engineering Approach to Computer Networking Switching An Engineering Approach to Computer Networking What is it all about? How do we move traffic from one part of the network to another? Connect end-systems to switches, and switches to each other

More information

Available online at ScienceDirect. Procedia Technology 25 (2016 )

Available online at  ScienceDirect. Procedia Technology 25 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Technology 25 (2016 ) 544 551 Global Colloquium in Recent Advancement and Effectual Researches in Engineering, Science and Technology (RAEREST

More information

Interconnection Network

Interconnection Network Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics

More information

Multi-path Routing for Mesh/Torus-Based NoCs

Multi-path Routing for Mesh/Torus-Based NoCs Multi-path Routing for Mesh/Torus-Based NoCs Yaoting Jiao 1, Yulu Yang 1, Ming He 1, Mei Yang 2, and Yingtao Jiang 2 1 College of Information Technology and Science, Nankai University, China 2 Department

More information

AN FFT PROCESSOR BASED ON 16-POINT MODULE

AN FFT PROCESSOR BASED ON 16-POINT MODULE AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se,

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Introduction to Communications Part One: Physical Layer Switching

Introduction to Communications Part One: Physical Layer Switching Introduction to Communications Part One: Physical Layer Switching Kuang Chiu Huang TCM NCKU Spring/2008 Goals of This Lecture Through the lecture and in-class discussion, students are enabled to compare

More information

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 PP 11-18 www.iosrjen.org Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory S.Parkavi (1) And S.Bharath

More information

Chapter 11. Introduction to Multiprocessors

Chapter 11. Introduction to Multiprocessors Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)

More information

Honorary Professor Supercomputer Education and Research Centre Indian Institute of Science, Bangalore

Honorary Professor Supercomputer Education and Research Centre Indian Institute of Science, Bangalore COMPUTER ORGANIZATION AND ARCHITECTURE V. Rajaraman Honorary Professor Supercomputer Education and Research Centre Indian Institute of Science, Bangalore T. Radhakrishnan Professor of Computer Science

More information

A quasi-nonblocking self-routing network which routes packets in log 2 N time.

A quasi-nonblocking self-routing network which routes packets in log 2 N time. A quasi-nonblocking self-routing network which routes packets in log 2 N time. Giuseppe A. De Biase Claudia Ferrone Annalisa Massini Dipartimento di Scienze dell Informazione, Università di Roma la Sapienza

More information

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT UNIT-III 1 KNREDDY UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT Register Transfer: Register Transfer Language Register Transfer Bus and Memory Transfers Arithmetic Micro operations Logic

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved. Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE

More information

Interconnecfion. the processors and memory modules.

Interconnecfion. the processors and memory modules. Concurrent processing depends on interconnection networks for communication among processors and memory modules. Various network topologies and switching strategies are covered here. A Survey of Interconnecfion

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information