A Formal View of Multicomputers. Jose A. Galludy, Jose M. Garcazand Francisco J. Quilesy

Similar documents
A New Theory of Deadlock-Free Adaptive Multicast Routing in. Wormhole Networks. J. Duato. Facultad de Informatica. Universidad Politecnica de Valencia

PEPE: A Trace-Driven Simulator to Evaluate. Recongurable Multicomputer Architectures? Campus Universitario, Albacete, Spain

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Deadlock and Livelock. Maurizio Palesi

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Deadlock- and Livelock-Free Routing Protocols for Wave Switching

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

3-ary 2-cube. processor. consumption channels. injection channels. router

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Processor. Flit Buffer. Router

Generic Methodologies for Deadlock-Free Routing

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent

4. Networks. in parallel computers. Advances in Computer Architecture

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults

Under Bursty Trac. Ludmila Cherkasova, Al Davis, Vadim Kotov, Ian Robinson, Tomas Rokicki. Hewlett-Packard Laboratories Page Mill Road

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

NOC Deadlock and Livelock

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

A Hybrid Interconnection Network for Integrated Communication Services

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes

Interconnect Technology and Computational Speed

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

the possibility of deadlock if the routing scheme is not appropriately constrained [3]. A good introduction to various aspects of wormhole routing is

Ecube Planar adaptive Turn model (west-first non-minimal)

Routing and Deadlock

Lecture: Interconnection Networks

University of Castilla-La Mancha

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Routing Algorithms. Review

Wormhole Routing Techniques for Directly Connected Multicomputer Systems

Deadlock and Router Micro-Architecture

Analytical Modeling of Routing Algorithms in. Virtual Cut-Through Networks. Real-Time Computing Laboratory. Electrical Engineering & Computer Science

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Avalanche Myrinet Simulation Package. University of Utah, Salt Lake City, UT Abstract

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

(i,j,k) North. Back (0,0,0) West (0,0,0) 01. East. Z Front. South. (a) (b)

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Message-Ordering for Wormhole-Routed Multiport Systems with. Link Contention and Routing Adaptivity. Dhabaleswar K. Panda and Vibha A.

A Real-Time Communication Method for Wormhole Switching Networks

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network

Adaptive Multimodule Routers

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Network-on-chip (NOC) Topologies

High-Performance Adaptive Routing in Multicomputers Using Dynamic Virtual Circuits

Rajendra V. Boppana. Computer Science Division. for example, [23, 25] and the references therein) exploit the

Escape Path based Irregular Network-on-chip Simulation Framework

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

Dynamic Network Reconfiguration for Switch-based Networks

Proc. XVIII Conf. Latinoamericana de Informatica, PANEL'92, pages , August Timed automata have been proposed in [1, 8] to model nite-s

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

EE 6900: Interconnection Networks for HPC Systems Fall 2016

COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS

Enhancing Integrated Layer Processing using Common Case. Anticipation and Data Dependence Analysis. Extended Abstract

A Survey of Routing Techniques in Store-and-Forward and Wormhole Interconnects

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department

Centre for Parallel Computing, University of Westminster, London, W1M 8JS

Performance Analysis of Routing Algorithms

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Lecture 18: Communication Models and Architectures: Interconnection Networks

MESH-CONNECTED networks have been widely used in

Autolink. A Tool for the Automatic and Semi-Automatic Test Generation

On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes

Multi-path Routing for Mesh/Torus-Based NoCs

Evaluation of NOC Using Tightly Coupled Router Architecture

EE/CSCI 451: Parallel and Distributed Computation

A MULTI-PATH ROUTING SCHEME FOR TORUS-BASED NOCS 1. Abstract: In Networks-on-Chip (NoC) designs, crosstalk noise has become a serious issue

Akhilesh Kumar and Laxmi N. Bhuyan. Department of Computer Science. Texas A&M University.

Interconnection Network

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Physical Input Links. Small Buffers

The Adaptive Bubble Router 1

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Performance Analysis of a Minimal Adaptive Router

\Classical" RSVP and IP over ATM. Steven Berson. April 10, Abstract

DUE to the increasing computing power of microprocessors

Lecture 2 Parallel Programming Platforms

Combating Packet Loss in OPS networks: A Case for Network Coding. Gergely Biczók and Harald Øverby NTNU Dept. of Telematics.

University of Castilla-La Mancha

A New Proposal to Fill in the InfiniBand Arbitration Tables Λ

Design and Implementation of Multistage Interconnection Networks for SoC Networks

Overview. Processor organizations Types of parallel machines. Real machines

Transcription:

A Formal View of Multicomputers Jose A. Galludy, Jose M. Garcazand Francisco J. Quilesy ydepartamento de Informatica, Universidad de Castilla-La Mancha, Escuela Universitaria Politecnica de Albacete, Campus Universitario s/n, 02071 Albacete, SPAIN. email:fjgallud,pacog@info-ab.uclm.es zdepartamento de Informatica y Sistemas, Facultad de Informatica, Campus de Espinardo, Universidad de Murcia, 03080 Murcia, SPAIN. email:jmgarcia@dif.um.es Abstract Formal Methods and Techniques are widely used in communication systems and distributed computing. Today, such techniques are not suciently employed in other systems as industrial processes or hardware design. In this paper we propose a methodology to apply the Lotos technique in computer design. Thus, we describe a particular case based on specifying a classical router algorithm in massively parallel machines. The e-cube routing is a deadlock-free algorithm which serves as a valid reference for our purpose. Lotos tools are used to validate the system, in particular we have used LOLA for analysis and design testing. 1: Specifying Computer Behaviour Using Lotos A number of advantages have been attributed to the use of formal specications. These advantages cover from enhanced insight into and understanding of specications, up to help in verication of the specications and their implementations. Formal methods are intended to be used by software and hardware designers to describe systems, and in particular, to verify that the system has the desired properties [19]. As we can see in [5], the design of special systems as life-critical systems with faultavoidance is only intellectually defendable by using formal methods. Formal methods are of particular importance in the development of real-time systems [16]. Many other systems [13] can be designed using formal techniques in order to reduce both high cost and time related to the development of complex systems. Finally, formal methods are specially recommended for security and safety systems. Lotos [3] is playing an important role in exploring the behaviour of diferent kinds of systems. Communication protocols denition and distributed computing are two of the main areas within which formal specication is widely used. Lotos, Estelle and SDL have been chosen to describe important architectural and functional design features related to software engineering [20]. Lotos Formal Description Technique allows software designers to give a precise relation between initial and nal states of any application, without describing implementation details. 1

Formal specication techniques are generally benecial [12] [4] because a formal language makes specications more concise and explicit than informal ones [21]. These techniques help us to acquire greater insights into the system design, dispel ambiguities, maintain abstraction levels, and determine both our approach to the problem as well as its implementation. In this paper we intend to study a special application of the Lotos formal technique to multicomputer design. In a former work [9] we showed the basis of a methodology for specifying diferent problems related to multicomputer design. Multicomputers are a known class of massively parallel machines that have thousand of processors interconnected through a communication network [1]. The nodes do not physically share memory, each one using a local memory. This hardware feature forces nodes to communicate by message-passing technique through the network [17]. The basic function of an interconnection network is to transfer information among communicating nodes of a multicomputer in a ecient manner. In a broad sense, routing refers to the communication methods and algorithms used to implement this function: how to set up a communication path in the network, how to choose a path from many alternatives, and how to handle contention for resources in the network [18]. An important aspect in the design of a routing algorithm is the possibility of deadlocks. Deadlocks occur as a result of the simultaneous occurrence of four conditions:cyclic wait,mutual exclusion, No preemption and Hold and wait. The resources in routing are the channels and buers, so deadlock can result if a communication request is allowed to hold the resources allocated to it while waiting for additional resources to become available. A logical channel cannot be shared among nodes. Deadlocks can appear both in circuit-switched and packet-switched networks. Similarly, both store-and-forward and wormhole approaches are susceptible to produce deadlocks. Various approaches can be used to deal with deadlocks: deadlock avoidance, deadlock detection and deadlock prevention. A detection approach can be used to break a deadlock once it has occurred, however this option is not popular because the channel selection must be repeated, resulting in large and unpredictable latencies. The unique approach in interconnection networks is to prevent deadlocks by careful design of the routing algorithm. Deadlock avoidance studies the new resource allocation graph for each request in order to detect deadlocks. If a deadlock appear,then the request is rejected. As can be seen, this method introduces large and prohibitive latencies. Deadlock prevention is focused on the four conditions we have showed above. The solution is to introduce constraints in order to eliminate a condition. An easy approach is impose an order in which resources are allocated to a communication request, avoiding the cyclic-wait condition. Both circuit-switched and packet-switched networks can prevent deadlocks by using one of these methods [18]: The virtual-channel technique (wormhole routing), that is, the capacity of a physical channel is multiplexed among multiple virtual channels. By partitioning the available buers in a node into multiple classes and controlling the allocation of buer classes to packets (store-and-forward). By reducing the path selection process in order to avoid cyclic-wait situations, an example for adaptative routing can be found in [8]. The existence of deadlocks in routing algorithms can be studied by constructing the resource allocation graph (channel or buer dependence graph). A deadlock can appear in the system if the resource allocation graph has a cycle [11] and conversely, the allocation occurs if the graph has no cycles. Most of deadlock-free algorithms have been obtained by analyzing the buer dependence graph. In this paper we present an alternative to graphs for deadlock prevention and detection

based on the use of the Lotos formal technique. The tools and methods applied to communication networks have successfully used in interconnection networks, and the resource allocation graph is no exception indeed formal techniques should be no exception either. Such a discussion is important in the design of optimal routing algorithms. Formal methods can help designers to nd deadlock-free algorithms and, by extension, to develop multicomputers with a rigorous method, with all the advantages of formal methods [7]. We show this by describing the specication of a deadlock-free algorithm, namely, the row-column deterministic algorithm. The rest of the paper is as follow. In section two we show the case study of a routing algorithm. This routing algorithm is specied using Lotos in section three and, nally, in section four we describe our conclusions and future work. 2: A Case Study:Routing Algorithms In a multicomputer, the way the nodes are connected to one another varies from one machine to another. In a direct network, each node has a point-to-point, or direct, connection to some number of other nodes, called neighboring nodes. Such an aspect denes a main feature of multicomputers, the topology. Interconnection networks usually have regular and xed topologies: hypercubes, torus and meshes are the most known topologies. We have choosen a popular topology from the k-ary n-cube family: the 3-ary 2-cube or 2D torus. In a multicomputer, all the interprocessor communication functions are usually handled by a router. The router interconnects a number of external channels to other nodes in the network, and one or more internal channels to the local processor. Our 2D torus has nine nodes and each node has four external channels and two internal channels. When a packet arrives at the router from an external channel it may either be destined to the local node or may be forwarded on by another external channel. A valid taxonomy of communication methods used in static networks can be found in [15]. All methods assume a form of distributed control since centralized control needs a power node to make the routing decision and thus is not a practical scheme. Such a taxonomy distinguishes between three classes: 1. Path setup: Denes how -statically or dynamically- the path between two nodes is set up. The dynamic approach is usually divided into circuit-switching and packet switching approaches. 2. Routing path selection: Denes the problem of choosing a path from among the many alternatives.basically there are two approaches: deterministic and adaptative. 3. Network control ow: Denes the techniques used to regulate the movement of packets from node to node and the ecient use of the network resources. The best known techniques are the following: Store-and-forward, Virtual cut-through and Wormhole. This classication can help us to characterize a particular router algorithm, and then we can carry out the study of its properties. Thus, the specication of the routing algorithm dened in this paper has the next features: the path is set up dynamically, the routing path selection is deterministic and, nally, the technique used to network control ow is wormhole. In wormhole routing, a message is divided into a sequence of xed-size units, called its. We have choosen a popular deterministic algorithm: the e-cube or XY algorithm. Such an algorithm oers a main feature, namely, it is deadlock-free.

We want to study the deadlock problem, and how Lotos can solve it. So, in this paper we present both a torus topology and a router algorithm specication in order to analyze deadlock states. We assume all nodes always select a valid destination, and especially a node in the border. As we have noted above the channel dependence graph models deadlocks in wormhole but it is sometimes tedious to construct the graph and check for cycles [18]. A classic example of proving deadlock freedom of some algorithms can be found in [6]. 3: The e-cube Algorithm Specication Lotos specications are networks of processes that are activated concurrently and communicate through shared gates or channels. A Lotos specication has two dierent parts: the abstract data type (ADT) denition and the system behaviour denition by specifying the temporal relationship between every synchronization on the gates. We can indicate the types used within the specication: boolean, natural numbers from 1 to 9, and two special types called Message and Chan. ADT's are not important in this paper. The specication can be studied by distinguishing between two parts: the specication of the torus topology and the specication of the row-column router. The specication of the topology is dened after the Lotos word behaviour. We have dened a Lotos expression with nine Lotos processes which represent each nine nodes of the multicomputer. This Lotos expression models the torus topology and the expression denes channel joinning two processes. On the other hand we have the expressions related to the second part, the e-cube algorithm specication. Such a denition is provided by three Lotos processes called enc, comp and cross. Each node of a 3-ary 2-cube has a number of modules: the router, a local memory and a processor are the most important. The router has to perform two main tasks: routing an incoming packet and crossing its from ingoing to outgoing channels. Once a header it of an incoming packet is processed, the outgoing channel must be selected. This task is dened with the Lotos processes enc and comp. Once an outgoing channel has been selected (if the channel is free), the router has to cross the incoming its up to the outgoing channel. It can be seen that while the router is crossing a message, a new message can arrive via a free channel. In this case such a process should be repeated to manage the new incoming message. With available Lotos tools, we can analyze and test the desired behaviour. A basic method has been used to test the specication. Such a method includes a number of possible communication paths for a particular node. Among other we can study the following communication directions: 1. Forward a node in the same row 2. Backward a node in the same row 3. Forward a node in the same column 4. Backward a node in the same column 5. Simultaneous combinations of these four cases Lotos analysis can be done using step-by-step execution with tools such Lite or LOLA [14]. This operation helps us to detect particular problems, but the global space is too large and it is impossible to cover all of them. Testing allow us to reduce this space state by composing

the specication with a test case. The next specication is a simple test case to check the simultaneous communication among four nodes: from node1 to node3 and from node4 to node5. The LOLA test operation answers with MUST PASS, MAY PASS or REJECT when it composes a specication with an acceptance test. process two1[cia,cid,ci2,c1,c2,c6,success]:noexit:= cia!flit3;c1!libre;c1!flit3;cid!flit5;c2!libre;c2!flit3;c6!libre;ci2!libre; ci2!flit3; c6!flit5;cia!flit;c1!flit;ci2!libre;c2!flit;ci2!flit5;ci2!flit; cid!flit;c6!flit;ci2!flit; cia!end;c1!end;c2!end;ci2!end;cid!end;c6!end;ci2!end; success; stop endproc The result shows that there is no rejection: Analysed states = 27 Generated transitions = 27 Duplicated states = 0 Deadlocks = 0 Process Test = two1 Test result = MUST PASS. successes = 1 stops = 0 exits = 0 cuts by depth = 0 All nine nodes in the torus work in the same way, the behaviour is dened by the same Lotos process comp. This aspect allows us to execute a limited number of test cases to check the desired behaviour and to detect routing limitations. In this case we have deadlocks because the specication does not dene virtual channels. As we can see above the e-cube algorithm is deadlock-free if the network has two virtual channels. This particular feature of our specication can be detected as a result of a test rejection on LOLA. 4: Conclusions This paper has analyzed how to apply formal techniques to computer design, specially to multicomputer design. In such systems the communication network plays an important role in determing the overall performance of the system. Formal techniques can help us to study a number of problems related to communication network design. The application of formal methods to hardware design has been increasing in recent years, some examples of which are given in [2] and [10]. Our study of the network has focused on the much discussed problem of designing a router algorithm. An important issue in the design of a routing algorithm is the possibility of deadlocks. Deadlock appears when four conditions are true in the system simultaneously: mutual exclusion, no preemption, cyclic wait and hold and wait. The way to treat deadlocks is prevent

them by careful design of the routing algorithm. Many earlier works have studied such a problem by using the channel dependence graph in order to obtain a deadlock-free algorithm in dierent ways. Our proposal is introduce the use of the Lotos technique as a valid and easier method to design deadlock-free algorithm. Thus, we have showed the specication of a deterministic e-cube algorithm in a 3-ary 2-cube topology. Lotos tools have been used to validate the system by using the test operation. In future works we will be studying both other router algorithms and other topology denitions. In this way, we are looking for new Lotos expressions to dene greater topologies, since our Lotos expression is dicult to scale. All these new expressions can be analyzed and veried with dierent tools. References [1] W.C. Athas and C.L. Seitz. Multicomputers: Message-passing concurrent computers. IEEE Computer, 21(8):9{24, August 1988. [2] G. Barret. Model checking in practice: The t9000 virtual channel processor. IEEE Transactions on Software Engineering, 21(2):69{78, February 1995. [3] T. Bolognesi and E. Brinskma. Introduction to the iso specication language lotos. In The Formal Description Technique Lotos, pages 23{73. Elsevier Science Publishers B.V., 1989. [4] J.P. Bowen and M.G. Hinchey. Seven more myths of formal methods. IEEE Software, March 1995. [5] R.W. Butter. Formal methods for life-critical software. In AIAA Computing in Aerospace 9 Conference, pages 319{329, 1993. [6] W.J. Dally and C.L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, c-36(5):547{554, 1986. [7] J. Davies and M. Wallis. On the formal specication and verication of network algorithms. In Proc. of VII Formal description techniques, pages 81{97, 1994. [8] J. Duato. Deadlock-free adaptative routing algorithm for multicomputers: Evaluation of a new algorithm. In Proc of Third IEEE Symposium in Parallel and Distributed Processing, pages 840{848, 1991. [9] J.A. Gallud. Formal specication of multicomputers. Lecture Notes in Computer Sciences, pages 380{ 390, February 1996. [10] G. Gopalakrishnan and R. Fujimoto. Design and verication of the rollback chip using hop: A case study of formal methods applied to hardware design. ACM Transactions on Computer Systems, 11(2):109{ 145, 1993. [11] K.D. Gunther. Prevention of deadlocks in packet-switched data transport systems. IEEE Transactions on Communications, 29(4):512{524, 1981. [12] A. Hall. Seven myths of formal methods. IEEE Computer, pages 11{20, September 1990. [13] M.G. Hinchey and J.P. Bowen. Applications of Formal Methods. Prentice Hall International Series in Computer Sciences, 1995. [14] S. Pavon J. Quemada and A. Fernandez. Transforming lotos specications with lola: The parametrized expansion. In Formal Description Techniques I, pages 45{55. IFIP/North-Holland, 1988. [15] H.T. Kung. Network-based multicomputers: Redening high performance computing in the 1990's. In Proc. of the Decennial Caltech Conference on VLSI, pages 30{41, 1989. [16] R. Kurki-Suonio. Stepwise design of real-time systems. IEEE Transactions on Software Engineering, 19(1), January 1993. [17] L.M. Ni and P.K. Mckinley. A survey of wormhole routing techniques in direct networks. IEEE Computer, pages 62{76, February 1993. [18] H.J. Siegel. Interconnection Networks for Large-Scale Parallel Processing. McGraw Hill, 1990. [19] O. Tanir and V.K. Agarwal. A specication-driven architectural design environment. IEEE Computer, pages 26{35, June 1995. [20] J. van de Lagemaat. Lotosphere, An Overview on Research and Achievements '91. Ellis Horwood, 1991. [21] J.M. Wing. A specier's introduction to formal methods. IEEE Computer, pages 8{24, September 1990.