RATATOSKR: WIDE-AREA ACTUATOR RPC OVER GRIDSTAT WITH TIMELINESS, REDUNDANCY, AND SAFETY

Size: px
Start display at page:

Download "RATATOSKR: WIDE-AREA ACTUATOR RPC OVER GRIDSTAT WITH TIMELINESS, REDUNDANCY, AND SAFETY"

Transcription

1 RATATOSKR: WIDE-AREA ACTUATOR RPC OVER GRIDSTAT WITH TIMELINESS, REDUNDANCY, AND SAFETY By ERLEND SMØRGRAV VIDDAL A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE WASHINGTON STATE UNIVERSITY School of Electrical Engineering and Computer Science DECEMBER 2007

2 To the Faculty of Washington State University: The members of the Committee appointed to examine the thesis of ERLEND SMØRGRAV VIDDAL find it satisfactory and recommend that it be accepted. Chair ii

3 ACKNOWLEDGEMENT I would like to thank my advisor Dave Bakken for his advice and guidance throughout my studies at WSU, and for taking an active interest in the well-being of his students beyond professional obligations. I would also like to thank Carl Hauser and Min Sik Kim for taking the time to be on my committee, and especially Carl Hauser for his help with the work on my thesis. Further, I would like to tank all past and current members of the GridStat team for their great work, and for valuable discussion and contributions on my research. A special thanks goes to my friends in Norway and in Pullman, and my family for their continuing support during my research, and for making my stay here much more enjoyable. Finally, I would like to thank the organizations that have provided financial support for education and research. In particular, I have received a stipend from The Norwegian State Educational Loan Fund and tuition reduction from Washington State University. In addition, my research has been supported in part by grants CNS (CT-CS: Trustworthy Cyber Infrastructure for the Power Grid(TCIP)) and CCR from the US National Science Foundation. iii

4 PUBLICATIONS Erlend S. Viddal, Stian Abelsen, David Bakken and Carl Hauser, Ratatoskr: Wide-Area Actuator RPC over GridStat with Timeliness, Redundancy, and Safety, in DSN 08: Proceedings of the International Conference on Dependable Systems and Networks (DSN 08). To be submitted in Fall 2007 iv

5 RATATOSKR: WIDE-AREA ACTUATOR RPC OVER GRIDSTAT WITH TIMELINESS, REDUNDANCY, AND SAFETY Abstract by Erlend Smørgrav Viddal, M.S. Washington State University December 2007 Chair: David E. Bakken The development of the communication infrastructure for the north-american electrical power grid has failed to fully incorporate important developments in the field of computer science, affecting the stability and efficiency of the power grid as a whole. The current power-grid communication standard, SCADA, utilizes protocols specialized for centralized communication, hampering communication between field sites key for envisioned improvements of power grid safety and efficiency. Further, a number of different proprietary communication protocols are in use, making communication between power utility companies very difficult. GridStat is a communication infrastructure designed for a power grid environment that solves many of the problems with the current situation. GridStat uses a specialization of the publish-subscribe middleware paradigm, status dissemination, that takes advantage of the semantics of status data to provide flexible acquisition of power-grid data with multiple dimensions of QoS semantics. The middleware approach enables communication between utilities independent of proprietary network protocols, and allows enhanced network features such as forwarding data through multiple redundant paths. While GridStat provides excellent support for data acquisition, the publish-subscribe architecture supports only one-way communication and provides syntax and semantics unsuitable for control communications. This thesis presents Ratatoskr, a novel scheme for control of actuators using GridStat v

6 communication. It constructs a two-way communication channel on top of GridStat publish/subscribe paths, and utilizes the QoS semantics and middleware properties GridStat provides. For control communication Ratatoskr uses remote procedure call (RPC), providing programmer friendliness and familiarity. The QoS semantics of GridStat are drawn upon to provide the timeliness required for power-grid operation. Reliability concerns are addressed by providing three redundancy schemes, ACK/resend, transmitting multiple copies of a single packet, and spatial redundancy through GridStat s redundant routing paths feature. Additionally, pre- and post-condition expressions over GridStat status variables are built into call semantics. The architecture and design of Ratatoskr is presented, along with results from an evaluation of a prototype implementation. vi

7 TABLE OF CONTENTS Page ACKNOWLEDGEMENTS iii PUBLICATIONS iv ABSTRACT vi LIST OF TABLES x LIST OF FIGURES xi CHAPTER 1. INTRODUCTION Current Power Grid Communication Infrastructure GridStat Ratatoskr Contributions of Thesis Organization of Thesis BACKGROUND AND RELATED WORK Middleware Remote Procedure Call Failure Semantics CORBA Publish/Subscribe Status Dissemination vii

8 2.4 GridStat Architecture THE RATATOSKR RPC MECHANISM Definition of terms Two-way Communication over a Publish-Subscribe Framework Properties of the 2WoPS Protocol Reliability Measures The Ratatoskr RPC RPC semantics Pre-and Post Conditions Limitations Assumptions DESIGN OF RATATOSKR Design of the 2WoPS Transport Protocol Modules Sending Process Design of the RPC Mechanism Modules Use of Reflection and Serialization RPC Flow Pre- and Post-conditions EVALUATION Evaluation Procedure Topology viii

9 5.1.2 Network Fault Model Evaluation Testbed Processes Hardware Garbage Collection Handling Java Virtual Machine Arguments Experiment Procedure Result Data Expected Results Experimental Results Resiliency of Temporal Redundancy Resiliency of Spatial Redundancy Comparison to Traditional RPC CONCLUSION AND FUTURE WORK Concluding Remarks Future Work Long Term Connections Fault Tolerance Level Calculation Extensions to the 2WoPS Protocol Extensions to the RPC Mechanism Security Future Evaluations BIBLIOGRAPHY ix

10 LIST OF TABLES Page 3.1 Comparison of Redundancy Techniques Expected Failure Rates for Redundancy Techniques Calculated End to End Loss Compared to per Link Loss x

11 LIST OF FIGURES Page 3.1 Ratatoskr Module Stack Sending Process for the 2WoPS Protocol RPC Send Process Call Process When Failing Pre-Condition Call Process With Post-Condition Evaluation Topology Comparison of Performance With and Without Garbage Collection Compensation Early Success for Temporal Redundancy over Varying Omission Fault Rates Early Success for Spatial Redundancy over Varying Duration Faults Early Success for Varying Redundancy and Loss Average Calltimes for Various Redundancy with Full Loss Cumulative distributions of number of timeouts per call xi

12 CHAPTER ONE INTRODUCTION The North-American electric power-grid is among the largest and most complex systems created by man. Its critical mission of balancing changing demand and generation of power involves coordinating diverse sets of components over a very large areas, and in a large number of utilitydomains. This balancing process requires extensive communication between components in the Grid for monitoring system state and controlling actuator devices. The development of the grid communication infrastructure has failed to incorporate important developments in the field of computer science, affecting the stability and efficiency of the power grid as a whole, [2]. GridStat is a communication infrastructure designed for a power grid environment that would solve many of the problems with the current situation, but it does not conveniently control communication, [8]. This thesis proposes a novel scheme for control of actuators using GridStat communication. 1.1 Current Power Grid Communication Infrastructure In the 1960s, utilities started shifting from mainly using field personnel and telephone communication for control of the power grid to electronic schemes. Today the predominant Grid communication architecture is SCADA (Supervisory Control and Data Acquisition). The SCADA architecture has not changed notably from its origins. It is a centralized approach, in which a manned regional control center gathers data from and issues control signals to devices in geographically dispersed field sites. Early systems were developed without any official standards, resulting in numerous proprietary protocols. SCADA systems have since developed incrementally, and often incorporate a blend of new and old communication technology. Topologies are predominately varieties of starshapes, and protocols are mostly designed solely for communication between control center and field sites, [12]. 1

13 With increasing stress on the transmission network, distribution models growing more complex and looming threats of terrorism and cyber security risks, there is a pressing need for better monitoring of grid dynamics and improved control schemes, [2]. The inherent inflexibility of the SCADA architecture is unable to accommodate this. Communication between utilities is mostly done by telephone between operators, making observation and containment of grid-wide phenomena such as rolling blackouts very difficult. Fast automated control schemes involving substation to substation communication have yet to be standardized, and are implemented using expensive, specialized point-to-point links, [2]. The Intelligrid project, a vision of a future power grid created by an international consortium of power researchers, industry representatives, equipment manufacturers and government representatives, argues for several applications of communication substation to substation, substation to field equipment, and field equipment to field equipment, yet it does not propose a wide-area communication mechansism, [6]. IEC is a widely accepted standard for substation automation that includes standardized self description of devices independent of brand and an event-driven communication model, [17]. While IEC holds great potential for improved substation control, it does not specify a wide-area network mechanism in itself. Continued incremental development of the existing centralized and inflexible communication structure will severely inhibit potential growth in power-grid efficiency and stability. 1.2 GridStat Gridstat is a framework for power-grid communication centered around a middleware network for power-grid data acquisition, [8]. It provides a flexible communication scheme with the reliability and timeliness required in a power-grid network. GridStat routes traffic on top of existing communications infrastructure through a series of application-layer routers, overcoming the inherent heterogeneity of legacy networking technology. The unifying middleware framework creates a 2

14 flexible overlay topology on top of the centralized designs of existing power-grid network infrastructures, allows for easy interoperability between power utility companies despite use of proprietary transport protocols and offers abstractions to network services, in addition to several other features well suited for a power-grid infrastructure that are less relevant in context of this thesis. GridStat follows the publish-subscribe (pub-sub) paradigm. A device can publish status information either directly to the GridStat framework, or through an intermediary middleware publisher module, possibly located on another computer. The GridStat framework makes the information available as one or more status variables, published values that are regularly updated. Applications may retrieve status updates by subscribing to status variables through a GridStat subscriber interface. The GridStat framework forwards status updates from the publisher through the application-layer routers and finally to the subscriber. This overlay-network scheme allows Grid- Stat to offer a wide range of network features independent of the underlying technology. The most important of these are multicast and redundant forwarding paths (for fault-tolerance). In addition to offering functionality additional to that provided by the underlying network, GridStat improves the network Quality of Service (QoS), the nonfunctional properties of the network. QoS enhancments provided by GridStat include bounded delay, reliability and security. Currently GridStat forwards status updates in a one-way, pub-sub fashion, addressing the data acquisition needs of a grid operations infrastructure. While it would be possible to forward control commands using the existing status update mechanism, such communication would be cumbersome with the pub-sub interface and in many cases would require operation success feedback which is impossible over the one-way paths. Use of SCADA protocols for control while restricting use of GridStat to data acquisition would require modifying inflexible proprietary legacy code for each new control operation introduced, and would not be able to utilize the flexible topology and interoperability introduced with GridStat. Use of other existing QoS-enabled control schemes would require implementing an overlay transport protocol to allow interoperability and flexible 3

15 topologies, which is redundant when GridStat already provides middleware routing. Further, existing solutions would not be designed with the capabilities already found in GridStat in mind, and mechanisms exploiting these would have to reside in the application layer voiding any advantages that could be achieved by designing use of these features into the control semantics. 1.3 Ratatoskr This thesis proposes a power grid control scheme, Ratatoskr 1, using GridStat publications and subscriptions for communication. Ratatoskr is designed primarily for control of field sites from a control center, but use between field sites is imaginable. Remote Procedure Call (RPC) semantics are used because of its programmer friendliness and familiarity. Some of the traditional RPC features, especially transparency towards local procedure calls, are downplayed to better support the reliability and timeliness aspects required of a power grid control scheme. Reliability concerns are addressed by providing three redundancy schemes, ACK/resend, transmitting multiple copies of a single packet, and spatial redundancy through GridStat s redundant routing paths feature. ACK/resend represents a tradeoff between the timeliness and the reliability of the call, and multiple resends and redundant paths trades off reliability for network resources. Since the desired tradeoff parameters might vary between applications, Ratatoskr exposes these parameters to the programmer, along with other QoS properties of the call. Further, Ratatoskr allows pre- and post-conditions, which are predicate expressions, to be placed on the procedure calls. Pre-conditions are evaluated before the execution of a call, and will abort the call if the expression is not satisfied. Post-conditions are evaluated after the exectution of a call and the result returned back to the client application to indicate system state. Pre- and post-conditions in Ratatoskr may use status variables published to GridStat in the expressions, accommodating usage of data from remote locations. These predicates are built into the call semantics, providing standardized usage patterns, simplifying reuse and providing the option of 1 In norse mythology, Ratatoskr is a squirrel running around the great life-tree Yddgarsil, carrying insults between mythological creatures living on the branches. 4

16 delayed execution of post-conditions. Pre-conditions are tested before a call is carried out on the server side, aborting execution if the expression fails. Calls may then verify a safe system state before potentially dangerous operations, such as avoiding re-energizing a line if manned maintenance is scheduled in a endpoint substation at the time. Post conditions are carried out on the server after a call has completed, possibly after a specified delay. This allows grid programmers to verify the effects of operations. Power grid field sites often contain various mechanical devices which affect each other in complex ways, and the outcome of an operation could be unexpected even if the operation itself was successful. 1.4 Contributions of Thesis The research contributions of this thesis are: Design and implementation of a novel control scheme for an electical power grid environment where remote procedure calls are transported over a QoS enabled one-way publish subscribe middleware network (GridStat). Design and implementation of three distinctive techniques for redundancy, offering a tradeoff between worst-case deadline, use of network resources and resiliency towards a variety of network failure categories. Applications are allowed fine control of redundancy semantics. Design and implementation of pre- and post- conditions mechanisms designed into RPC semantics provides additional functionality over application-level implementation and allows for a standardized mechanism for control signals between utilities. An experimental evaluation quantifying the tradeoffs between the redundancy techniques and their performance. 5

17 1.5 Organization of Thesis The rest of this thesis is organized as follows: Chapter 2 summarizes related work and gives an introduction to GridStat required for understanding the contributions of this work. An overview of the Ratatoskr RPC mechanism and its underlying transport protocol is found in chapter 3. Chapter 4 details the design of a prototype implementation. Chapter 5 presents the findings of an experimental evaluation of the prototype. Finally, chapter 6 provides a summary of future work and the conclusion. 6

18 CHAPTER TWO BACKGROUND AND RELATED WORK This chapter gives an overview of relevant technologies, an overview of the GridStat framework architecture and details on the GridStat design related to the Ratatoskr mechanism. A more detailed introduction to GridStat can be found in [8]. 2.1 Middleware Distributed computing involves processes on separate machines cooperating, commonly over a network. If there are differences in the runtime environments of the interacting processes, such as data representation, some sort of translation must be performed between processes to ensure correct interaction. Middleware is software layered between the OS and the application offering abstractions to inter-process interactions and providing any needed translation services between process environments. Many different types of middleware interaction styles exist, accommodating a wide range of distributed system architectures. 2.2 Remote Procedure Call Remote Procedure Call (RPC), first presented in [4], is a style of middleware providing abstractions for remote execution of code in a client-server fashion. Client applications call remote procedures through an interface similar in syntax to local procedures at the client, and the RPC mechanism handles packing the call with parameters and sending it over the network, executing the code corresponding to the call at the server, and transmitting the result back to the client application. Remote procedure calls allow for return values in spite of the traditional sense of procedure as a returnless call. RPC calls are in nature synchronous and blocking. A frequent design goal in RPC systems has been to make remote calls indistinguishable from local calls both in syntax and semantics, although the latter has been shown to be impossible, [30]. 7

19 2.2.1 Failure Semantics Opposed to local procedures, a remote procedure call may fail during remote operation while the local client process remains operating correctly. Such failures could stem from errors during network transfer or failure during server execution. The failure semantics of an RPC mechanism is defined by the way remote failures are handled and the guarantees of successful execution provided to the client application. As any network in practice can be made reliable by resending messages until an acknowledgment (ACK) is received, there are mainly three schools of thought for failure semantics, [28]: At-least once - Provides guarantee that an RPC procedure is successfully executed given eventually reliable communication, but allows for repeated executions of the same call. This may be achieved by having the client repeatedly send a call until a result is received. The server executes all calls, no matter if they have been executed before, and sends results upon successful execution. This provides a strong guarantee, but at-least once is only practical for idempotent procedures. At-most once - Provides guarantee that execution of an RPC procedure is attempted exactly once at server given eventually reliable communication, but does not guarantee that the attempted execution is successful. A client retries sending a call until it receives a response from the server. To ensure that the call is attempted at most once redundant calls are filtered at the server, possibly using logs in stable storage to retain filtering after server crash. The server must respond negatively to filtered calls so the client knows when to stop sending. When the client receives a negative response, the execution status of the call is uncertain. Exactly once - Provides a guarantee that the RPC is executed exactly once at the server, and so is the ideal case. This is impossible in the general RPC paradigm, as the RPC mechanism is active only before and after application-level execution of a call on the server, and thus cannot infer about the success of execution if server fails between these, [29]. This can in 8

20 some cases be resolved through cooperation with the overlying application, but this must be at the expense of programmability, mechanism complexity and frequent writes to stable storage, and is seldom used in practice. While the beforementioned paradigms ideally rely on an eventually reliable network, it is often not practical to resend messages for an infinite number of times until success. The solution is most often to utilize no-loss transport protocols, that is transport protocols performing sends using ACK/retry schemes and that report back the delivery status of the send. While this type of transport protocol gives a high probability of delivery even over a faulty network, the overhead and high duration bound of such sends has given rise to a subdivision of at-most once semantics. Maybe once semantics provide zero-or-once execution semantics, but distinguishes from regular at-most once in that the underlying network sends do not ACK and so does not resend. This best-effort communication scheme provides a lower bound for end-to-end calltimes, and has little overhead, but at the cost of low reliability compared to regular at-least-once CORBA Common Object Request Broker Architecture (CORBA) is a comprehensive standard for interoperability between distributed object frameworks, [9]. Distributed objects are processes offering remote execution that are treated as abstract objects to separate the remote execution interface from the underlying implementation and platform. While CORBA is not strictly an RPC mechanism, the most common mechanism for making calls to distributed objects is so close to RPC in both syntax and semantics that it is relevant for this thesis. Many extensions to CORBA have been proposed, among them extensions targeting real-time operation, [11], and fault-tolerance 1, [10]. CORBA allows for the use of any underlying transport protocol, but dynamic configuration of communication protocols are not standardized and left to be specified by vendors, [24]. 1 It should be noted that Fault Tolerant CORBA focuses on fault tolerance through replication of services, while Ratatoskr focuses on replication of communication. 9

21 Real-time CORBA Real-time CORBA is an extension to CORBA for interoperability between frameworks accomodating real-time distributed systems. The extensions emphasize resource management in addition to the introduction of extensive call prioritization semantics including mapping to OS thread prioritization. Real-time CORBA supports setting transport protocol QoS properties upon object binding, [26]. This allows setting policies per invocation by rebinding for each invocation. Real-time CORBA is a mature standard with several field-tested implementations. For example, the TAO orb is being used for operation flight programs by the Boing corporation, [27]. Two strategies for using existing implementations of Real-time CORBA for actuator control in the power-grid would be to route Real-time CORBA traffic directly on top of utility networks, or to route Real-time CORBA traffic over a middleware layer that overcomes incompabilities. An alternative to using Ratatoskr over a GridStat for actuator control is to employ real-time CORBA on top of QoS aware networking technologies, such as ATM or diffserv IP. Such a real-time CORBA approach would provide timely control messages. Further, network level faulttolerance may be achieved by using multiple temporally redundant sends of each network packet. In addition to temporal redundancy, Ratatoskr uses the GridStat redundant paths feature to provide fault tolerance against network faults. In chapter 5, an evaluation of the performance of the fault-tolerance capabilities of Ratatoskr shows that redundant path routing provides fault tolerance against certain fault categories that affect all temporally redundant sends along a single path. We are not aware of any wide-area network technology providing routing with redunant paths. While this thesis presents an RPC mechanism designed specifically for actuator control over a GridStat connection, an alternative approach would be to implement a transport protocol enabling Real-time CORBA to communicate over GridStat. Where Ratatoskr is a pure RPC system, CORBA provides the advantages of a distributed object architecture, and compability to a large set of existing third party software components. Since Real-time CORBA extends the complex 10

22 CORBA standard, it requires adherence to a set of standardized semantics. While some requirements are provided in [6] and [17], the desired functionality of a power-grid control system is still largely unmapped and could potentially gain from mehcanisms not compatible the CORBA standard. The more minimalistic Ratatoskr design allows for rapid experimentation with features such as pre- and post-conditions and fine grained QoS semantics. Further, the communication subsystem of Ratatoskr can easily be adapted to carry Real-time CORBA traffic instead of Ratatoskr RPC calls, if Real-time CORBA is deemed desirable for a grid deployment Fault-tolerance in CORBA The distributed object paradigm architecture of CORBA lends itself well to service replication. As the distributed object interface is decoupled from the underlying implementation and environment, an object interface can be replicated into several implementations running in separate environments with minimum impact on observed behavior. Several CORBA implementations provide replicated objects, [23, 25, 20]. A replicated distributed object scheme, coupled with a real-time CORBA implementation, would provide timely delivery and fault-tolerance. Such a scheme would still have to rely on a the underlying network for network-level fault tolerance, and would not be able to reap the benefits of redundant path routing. Further, object replication has to rely on strong multicast guarantees for synchronization between replicas, which gives a high worst-case message rounds in face of communication failures and thus scales badly with geographical distance. 2.3 Publish/Subscribe The Publish/Subscribe middleware architecture centers around producers of information (publishers) and information consumers (subscribers). Publishers make information events available to a middleware network, and subscribers can request that events be forwarded to them by the network. The network forwards only subscribed data and can often optimize delivery paths through multicast, conserving bandwidth, [3]. The information flow is one-way; subscribers make subscription 11

23 requests to the middleware network itself rather than the publishers, allowing a decoupling between data producers and consumers. Further, published events can be stored in the network until the subscribers are ready to consume them, allowing a decoupling between publishing time and delivery to the subscriber, [7] Status Dissemination Status dissemination is a specialization of the publish/subscribe paradigm where publishers maintain status variables, [8]. Status variables are published values of a given type that are updated by publishing status events. Status events are limited by a maximum rate, and these restrictions in publication rate and type allow for additional QoS semantics compared to publish-subscribe systems without such restrictions. 2.4 GridStat This section presents an overview of GridStat s architecture, and details the design of modules relevant to Ratatoskr. The purpose of this overview is to provide a background for the rest of the thesis. A more complete introduction to gridstat can be found in [8] and [2] Architecture The GridStat architecture is separated into two main subsystems, the data plane, a middleware databus where status updates supplied by publishers are forwarded to subscribers, and the managment plane, a set of servers that manages system resources and organizes subscriptions by receiving subscription requests from subscribers and configuring the data plane towards forwarding accordingly. GridStat uses two kinds of communication traffic: Data traffic is always forwarded through the data plane message bus; control traffic between GridStat entities can be sent over any middleware control mechanism. The current implementation of GridStat uses CORBA and Ratatoskr as control message mechanisms. 12

24 Forwarding in the data plane is perfomed by status routers, middleware routers placed throughout a wide area network. Status routers form an overlay network by forwarding status events from router to router. The status routers retain implementations of all protocols used in the wide area network, and may function as bridges between the parts of the network using different networking technologies or with separate addressing spaces. Network connections in the data plane (from publishers and subscribers to status routers and betweens status routers) are represented as event channels that contain abstractions of data forwarding properties required for resource managment. Each publisher and subscriber has event channels to one or more status routers. Whereas the data plane has a flat organization, the managment plane consists of a hierarchy of servers called QoS brokers. QoS brokers in the lowest level of the hierarchy are leaf QoS brokers, and are the only QoS brokers that directly communicate with entities in the data plane. QoS brokers above the leaf level are called internal QoS brokers and act as the sole parent QoS Broker of one or more child QoS brokers. All QoS brokers have a parent, with the exception of the root QoS broker, and leaf QoS brokers do not have child QoS brokers. Each QoS broker is associated with a set of entities in the data plane, the QoS broker s cloud. The data plane is divided up so each status router belongs to the cloud of exactly one leaf QoS broker. Status routers that have event channels to the same publisher or subscriber must be in the same cloud, and publishers and subscribers belong to the same cloud as their status routers. The clouds of internal QoS brokers are defined as the union of the clouds of their children, and thus the cloud of the root broker is all entities in the data plane. Entities are named according to their relationship to the managment plane hierarchy. A GridStat element must have a name unique within the scope of its parent; its full name is the name within the scope with an added prefix of the parent s name. This hierarchy of clouds is meant to correspond to a natural organization of managment domains in the power grid, such as levels of geographical areas. As the data plane provides bounded delay and other QoS guarantees for subscription data, additional subscriptions must not overload network resources. The managment plane administers 13

25 the use of resources in the data plane, and so handles subscription requests. Subscription requests are made by the subscriber to its leaf QoS broker. If both the publisher and the subscriber of a new subscription are within the leaf level QoS broker s cloud, the leaf-level QoS broker is responsible for verifying that the connection will not overload network resources and update the status routers with the new subscription. If the publisher and subscribers are in different leaf-level clouds, the subscription request is propagated up in the hierarchy to the first QoS broker that has both within its cloud. Ratatoskr is build on top of GridStat subscription paths, and the most relevant GridStat modules in the context of this thesis are the publisher and the subscriber Publisher A publisher is a GridStat entity in the form of a module residing in an application program for publishing data to a GridStat network. It retains two connections to each of its status routers, an event channel for forwarding published status updates, and a middleware control channel for control messages that the status router forwards to the managment plane. The application can announce a new published variable through the module interface by providing a string name as identifier, a type, and the rate at which it is published. There is currently no policing on the maximum and minimum rates of publish updates. The managment hierarchy returns a 32-bit integer for identifying the variable within the GridStat network, a variableid. The application may update the value of a status variable through the module interface by specifying the variableid and the new value. The types of variables provide semantics for subscribed events, in addition to additional functionality outside the context of this thesis. The current types are various primary types (integer, floating point, bool...) and a user defined type, which is treated as a simple byte array by GridStat. The user defined type contains semantics for division into further subtypes, defined by the application. 14

26 Subscriber The subscriber is a GridStat entity module used by applications to subscribe to data published over the GridStat network by a publisher. Similar to the publisher, the subscriber also retains two channels to each of its status routers: An event channel for receiving subscribed updates and a control channel for subscribing or unsubscribing to status variables. To subscribe to a published status variable, the application passes the variable name, the name of the publisher, QoS parameters and a SubscriptionHolder, an object that stores the status value and is updated by the subscriber when it receives updated values from its status router. Applications can access the values directly through the SubscriptionHolder interface, or can specify a callback method that will be invoked when the SubscriptonHolder is updated. There are several implmentations of SubscriptionHolders corresponding to the types of status variables, and applications can provide additional implementations for added functionality, or for semantics supporting subtypes of user defined variables. GridStat allows subscribers to specify that subscription data should be sent over redundant paths. Subscriptions over redundant paths are sent through more than one path in the GridStat network, where, with the exceptions of Entry-point SRs, a status router or event channel present in one path is not present in any other paths. 15

27 CHAPTER THREE THE RATATOSKR RPC MECHANISM GridStat s mission is to provide a complete communication framework for the power-grid. In addition to the existing publish-subscribe functionality, a standardized control-mechanism is needed for allowing power-utilities to control field equipment through the GridStat infrastructure. Such a mechanism will have to accommodate timely execution and high fault tolerance due to the critical nature of Grid operation. Ratatoskr is an RPC mechanism designed to run on top of GridStat s publish-subscribe system, utilizing the QoS mechanisms provided by GridStat. Built into the RPC semantics are pre- and post-conditions on calls, intended for predicates over GridStat published variables. Ratatoskr s intended primary use is for control-center operators and mechanisms to send control-messages to actuators in substations, either directly accessing actuators or through an intermediary RPC server that can communicate with actuators through legacy APIs. This chapter gives an overview of the features of Ratatoskr. 3.1 Definition of terms The parts of the text regarding the transport protocol uses terms as defined in [18]. Additional terms are defined below. 2WoPS transport protocol - 2-Way over Publish Subscribe. Communication protocol defining two-way communication over two GridStat one-way subscription paths. 2WoPS peer - An application connected to a GridStat framework that utilizes the 2WoPS protocol for two-way communication using a GridStat publisher for sending data and a GridStat subscriber for receiving data. Ratatoskr peer - A device connected to a GridStat framework that utilizes Ratatoskr RPC for communication. 16

28 Entry-point SR - The GridStat status-router a publisher or subscriber connects to. When used in relation to a 2WoPS peer, the entry-point SR signifies the edge status-router used to connect both the publisher and subscriber of the 2WoPS peer. The current implementation of GridStat allows publishers and subscribers to connect only to a single status router, while the architecture allows for multiple connections. The rest of this thesis considers only the case of a single entry-point SR per publisher or subscriber, as the exact semantics of multiple entry-point SRs are still undefined. TSDU - Transport Services Data Unit, a chunk of data from an overlying application that is sent through a transport layer connection. transport protocol control message - Similar to a TSDU, but data is for control of the 2WoPS protocol, not for application use. TPDU - Transport Protocol Data Unit, a chunk of data from the transport layer that is sent over a network layer connection. In this context, GridStat pub/sub communication is seen as a network layer. A TPDU can be a TSDU with added transport layer headers, or data used exclusively for control information by the transport layer. Several TPDUs can duplicate the same TSDU, and a single TPDU can be spread over multiple TSDUs, although the latter is not implemented in the prototype (see section ). NSDU, NPDU - Network Service Data Unit and Network Protocol Data Unit, similar to TSDU and TPDU but for the network layer (GridStat pub-sub). A NSDU is exactly the same data as a corresponding TPDU, but viewed in context of the network protocol layer. An NSDU with an added network-layer header is an NPDU. 17

29 3.2 Two-way Communication over a Publish-Subscribe Framework GridStat is a publish-subscribe system. Publishers in the system make data in form of status updates available to the GridStat framework. Subscribers may request subscriptions to these variables, and GridStat will forward subscribed information from publishers to subscribers according to QoS properties specified at subscription time. Communication is strictly one-way; subscribers have no way of sending information to publishers. RPC communication requires a two-way communication as procedure calls often will return values to the client, and acknowledgments on successful calls are almost universally required even when the call has no return value. To allow for a two-way communication link to be established, Ratatoskr utilizes a transport protocol called the 2WoPS protocol on top of GridStat. The 2WoPS protocol achieves two way communication by instantiating both a publisher and a subscriber behind a single interface. To set up a two-way data path, two 2WoPS peers each publish a data variable specific to the session, and subscribe to the other peer s corresponding variable. Data is sent over the connection by publishing a status update containing the data, and received by the other peer through the subscriber interface. The 2WoPS interface masks the publisher and subscriber behavior. Using a layered approach to communication allows for other uses than Ratatoskr RPC traffic of the 2WoPS protocol. For example, the 2WoPS protocol was used for control communication between QoS Brokers in [1]. Figure 3.1 shows the relationship between the modules of Ratatoskr (light shade), the GridStat modules used by Ratatoskr (dark shade), and examples of potential other applications using GridStat or Ratatoskr modules (white). The example shows the architecture stack for a control center and a substation. The main intended use of Ratatoskr is illustrated by the control center control system using Ratatoskr RPC to execute control operations on an actuator in the substation. Other uses of the 2WoPS protocol may be to transport legacy control messages to actuators if the actuator API remains to be fully implemented for Ratatoskr. The publisher and subscriber used by the 2WoPS protocol may have other uses, such as sending sensor data from the 18

30 substation to the control center, or publishing reports of power-grid state aggregated in the control center to be used by protection schemes in the substation. Finally, while GridStat requires control of the underlying network resources, network technologies that manage resource use may reserve bandwidth for uses outside GridStat, such as transferring video feeds from surveillance cameras in the substation Properties of the 2WoPS Protocol The 2WoPS protocol is designed specifically for the Ratatoskr RPC. While this does not block out other uses for two-way communication over GridStat, care should be taken in noting the properties of the protocol, as these differ from the most common transport protocols, TCP and UDP. Some suggested extensions to the protocol to enhance use for other applications can be found in section This section gives a summary the main properties of the 2WoPS protocol. Connection oriented - This was a necessary design decision as the underlying GridStat communication is connection-oriented. The 2WoPS protocol interface provides method to open and close a connection. Controlled-loss - An adjustable ACK/resend scheme similar to the k XMIT scheme found in [21]. A TSDU will be retransmitted up to k times, where k is a user specified number. No ACK status is sent by the server on the k-th resend. This reduces the deadline for the sending process by the time for sending the ACK, at the expense of knowledge of the delivery status. It should be noted that while a missing ACK suggests that the message was not delivered, it cannot guarantee a failed delivery, as the message might have arrived while the ACK was lost. Because delivery status is unclear, the overlying RPC mechanism must still wait for a return from the server. When k is set to 0 the scheme has uncontrolled-loss properties. The controlled loss scheme gives little indication of the success of a call, which might be impractical for non-rpc use, so a no-loss scheme is also provided. 19

31 Figure 3.1: Ratatoskr Module Stack No-loss - adjustable ACK/resend scheme. Similarly to controlled-loss, TSDUs are retransmitted up to k times, only the no-loss scheme delivers an ACK even on the final send. For no-loss, delivery of a TSDU is uncertain only if each send attempt experiences faults, while for controlled-loss, delivery of a TSDU is uncertain even if the k-th send-attempt experiences no faults. This gives weaker failure semantics for controlled-loss, and more so at a low k. Controlled loss blocks 2 k 1 trip-times per send and uses 2k 1 TSDU-transfers of bandwidth where no-loss blocks for 2 k trip-times and uses 2k TSDU-transfers per send. Timeliness - GridStat provides delivery guarantees for subscriptions. The delivery guarantees of the underlying subscriptions are used to calculate tight timeout values for ack/resends, and delivery guarantees for TSDU sends. Blocking - Execution of a sending thread is blocked until the send is completed. A send is completed either when delivery is confirmed by receiving an ACK from the receiver, when the k-th ack times out for no-loss, or after the k-th send for controlled-loss. Multiple threads are still allowed to send in parallel. 20

32 Unordered delivery - No message ordering is provided. Received NSDUs containing TSDUs are delivered to the application in the order they were delivered to the 2WoPS protocol by GridStat. No duplicates - TPDUs duplicating the same TSDU are filtered so the TSDU is delivered only once to the server application. Error control - A simple cyclic redundancy check (CRC) is used to discard TPDUs containing bit errors. Hierarchical naming - A naming scheme similar to the one for publishers and subscribers in GridStat is used. A 2WoPS peer is identified within its GridStat cloud by a string s with no spaces. The peer registers the publisher and subscriber used for communication with names based on this string, the publisher is named spub and the subscriber ssub. The names of the publishers and subscribers must be locally unique, that is no other 2WoPS peer may have the same name within the leaf-level cloud of the entry-point, and since clouds have unique names the fully qualified name is globally unique. A leaf-qos broker stores the names of all elements in its cloud and prevents registry of locally non-unique names. Message oriented - TSDUs are bounded by the maximum size of GridStat status updates, which is again bounded by an underlying transport protocol (UDP for the research prototype of GridStat) Reliability Measures A serious concern in any wide area network is that the number of components, geographical outstretch, and usage patterns of such networks inevitably lead to lowered reliability when compared to local area networks. This is especially apparent in the Internet, where most traffic uses the TCP transport protocol which uses TPDU drops to indicate congestion so it can regulate bandwidth usage. While GridStat controls network traffic at the network edges to avoid network overload 21

33 at least during normal operation, a GridStat deployment must be expected to share many of the loss properties of the internet stemming from other sources than traffic overload. These include hardware failure, maintenance, line damage or short-term miscommunication between routers. A 2002 study on an internet backbone found that with respect to mean failure rate, the median link failed every ten days, [16]. The mean failure had a duration of over one minute and 10% over 20 minutes. Such failure patterns are acceptable in the Internet because routing protocols will discover link errors and reconfigure routing to direct traffic around the affected links in a manner of seconds, and because few Internet applications depend on high network reliability. Also, while network drop rates during to transfer are negligible in the fiber and copper lines common in wide area networks today, GridStat is an overlay network and underlying physical network technologies might display other properties. Connecting remote substations to a utility network by fibre is expensive, and alternatives include microwave signaling, WiFi, power-line communications or satellite, all suffering from various forms of signal interference. The 2WoPS protocol provides several kinds of redundancy to overcome network failures Reliability Techniques in the 2WoPS Protocol The 2WoPS protocol employs three techniques for overcoming network losses: ACK/resend: allows specially marked TPDUs to be ACKed back to the sender, enabling the sender to resend the TPDU until it is confirmed successfully sent. ACK/resend is allowed for TPDUs containing TSDUs, enabling ACK/resend semantics on application messages. If an ACK is lost the sender will not be aware of delivery success and resend the TSDU, so redundant TSDUs must be filtered at the receiver. This technique guarantees successful delivery given an unlimited number of resends and an eventually-consistent network connection. Further, the technique uses a very limited amount of bandwidth to achieve fault tolerance. The main disadvantage with the technique is that the sender must wait a full RTT before a packet is confirmed lost and resend is commenced, and so the time for successful 22

ADAPTIVE GRIDSTAT INFORMATION FLOW MECHANISMS AND MANAGEMENT FOR POWER GRID CONTINGENCIES

ADAPTIVE GRIDSTAT INFORMATION FLOW MECHANISMS AND MANAGEMENT FOR POWER GRID CONTINGENCIES ADAPTIVE GRIDSTAT INFORMATION FLOW MECHANISMS AND MANAGEMENT FOR POWER GRID CONTINGENCIES By STIAN FEDJE ABELSEN A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

GridStat: A Status Dissemination Middleware for Critical Infrastructures. Harald Gjermundrød

GridStat: A Status Dissemination Middleware for Critical Infrastructures. Harald Gjermundrød GridStat: A Status Dissemination Middleware for Critical Infrastructures Harald Gjermundrød Talk Outline Background and Motivation GridStat Framework Adaptive Mechanism Pattern Mechanism RPC Mechanism

More information

COMMUNICATION IN DISTRIBUTED SYSTEMS

COMMUNICATION IN DISTRIBUTED SYSTEMS Distributed Systems Fö 3-1 Distributed Systems Fö 3-2 COMMUNICATION IN DISTRIBUTED SYSTEMS Communication Models and their Layered Implementation 1. Communication System: Layered Implementation 2. Network

More information

Prof. Dave Bakken. School of Electrical Engineering and Computer Science Washington State University Pullman, Washington USA

Prof. Dave Bakken.  School of Electrical Engineering and Computer Science Washington State University Pullman, Washington USA Requirements & Mechanisms for Flexible and Robust Inter-Utility Data Sharing Prof. Dave Bakken School of Electrical Engineering and Computer Science Washington State University Pullman, Washington USA

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Communication Paradigms

Communication Paradigms Communication Paradigms Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Interprocess Communication Direct Communication: Sockets Indirect Communication: IP Multicast 2. High Level Communication

More information

05 Indirect Communication

05 Indirect Communication 05 Indirect Communication Group Communication Publish-Subscribe Coulouris 6 Message Queus Point-to-point communication Participants need to exist at the same time Establish communication Participants need

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

Remote Invocation. 1. Introduction 2. Remote Method Invocation (RMI) 3. RMI Invocation Semantics

Remote Invocation. 1. Introduction 2. Remote Method Invocation (RMI) 3. RMI Invocation Semantics Remote Invocation Nicola Dragoni Embedded Systems Engineering DTU Informatics 1. Introduction 2. Remote Method Invocation (RMI) 3. RMI Invocation Semantics From the First Lecture (Architectural Models)...

More information

Introduction to Distributed Systems

Introduction to Distributed Systems Introduction to Distributed Systems Other matters: review of the Bakery Algorithm: why can t we simply keep track of the last ticket taken and the next ticvket to be called? Ref: [Coulouris&al Ch 1, 2]

More information

Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks

Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks Richard Harada Product Manager RuggedCom Inc. Introduction Reliable and fault tolerant high speed communication

More information

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline

More information

Data Model Considerations for Radar Systems

Data Model Considerations for Radar Systems WHITEPAPER Data Model Considerations for Radar Systems Executive Summary The market demands that today s radar systems be designed to keep up with a rapidly changing threat environment, adapt to new technologies,

More information

Remote Invocation. Today. Next time. l Overlay networks and P2P. l Request-reply, RPC, RMI

Remote Invocation. Today. Next time. l Overlay networks and P2P. l Request-reply, RPC, RMI Remote Invocation Today l Request-reply, RPC, RMI Next time l Overlay networks and P2P Types of communication " Persistent or transient Persistent A submitted message is stored until delivered Transient

More information

Structured communication (Remote invocation)

Structured communication (Remote invocation) Prof. Dr. Claudia Müller-Birn Institute for Computer Science, Networked Information Systems Structured communication (Remote invocation) Nov 8th, 2011 Netzprogrammierung (Algorithmen und Programmierung

More information

CCNA Exploration Network Fundamentals. Chapter 06 Addressing the Network IPv4

CCNA Exploration Network Fundamentals. Chapter 06 Addressing the Network IPv4 CCNA Exploration Network Fundamentals Chapter 06 Addressing the Network IPv4 Updated: 20/05/2008 1 6.0.1 Introduction Addressing is a key function of Network layer protocols that enables data communication

More information

CAS 703 Software Design

CAS 703 Software Design Dr. Ridha Khedri Department of Computing and Software, McMaster University Canada L8S 4L7, Hamilton, Ontario Acknowledgments: Material based on Software by Tao et al. (Chapters 9 and 10) (SOA) 1 Interaction

More information

Software Architecture Patterns

Software Architecture Patterns Software Architecture Patterns *based on a tutorial of Michael Stal Harald Gall University of Zurich http://seal.ifi.uzh.ch/ase www.infosys.tuwien.ac.at Overview Goal Basic architectural understanding

More information

Architecture and Implementation of a Content-based Data Dissemination System

Architecture and Implementation of a Content-based Data Dissemination System Architecture and Implementation of a Content-based Data Dissemination System Austin Park Brown University austinp@cs.brown.edu ABSTRACT SemCast is a content-based dissemination model for large-scale data

More information

DS 2009: middleware. David Evans

DS 2009: middleware. David Evans DS 2009: middleware David Evans de239@cl.cam.ac.uk What is middleware? distributed applications middleware remote calls, method invocations, messages,... OS comms. interface sockets, IP,... layer between

More information

Capacity Planning for Next Generation Utility Networks (PART 1) An analysis of utility applications, capacity drivers and demands

Capacity Planning for Next Generation Utility Networks (PART 1) An analysis of utility applications, capacity drivers and demands Capacity Planning for Next Generation Utility Networks (PART 1) An analysis of utility applications, capacity drivers and demands Utility networks are going through massive transformations towards next

More information

MOM MESSAGE ORIENTED MIDDLEWARE OVERVIEW OF MESSAGE ORIENTED MIDDLEWARE TECHNOLOGIES AND CONCEPTS. MOM Message Oriented Middleware

MOM MESSAGE ORIENTED MIDDLEWARE OVERVIEW OF MESSAGE ORIENTED MIDDLEWARE TECHNOLOGIES AND CONCEPTS. MOM Message Oriented Middleware MOM MESSAGE ORIENTED MOM Message Oriented Middleware MIDDLEWARE OVERVIEW OF MESSAGE ORIENTED MIDDLEWARE TECHNOLOGIES AND CONCEPTS Peter R. Egli 1/25 Contents 1. Synchronous versus asynchronous interaction

More information

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Announcements.  me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris Announcements Email me your survey: See the Announcements page Today Conceptual overview of distributed systems System models Reading Today: Chapter 2 of Coulouris Next topic: client-side processing (HTML,

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

DISTRIBUTED COMPUTER SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS DISTRIBUTED COMPUTER SYSTEMS Communication Fundamental REMOTE PROCEDURE CALL Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Outline Communication Architecture Fundamentals

More information

Ethernet Network Redundancy in SCADA and real-time Automation Platforms.

Ethernet Network Redundancy in SCADA and real-time Automation Platforms. Ethernet Network Redundancy in SCADA and real-time Automation Platforms www.copadata.com sales@copadata.com Content 1. ABSTRACT... 2 2. INTRODUCTION... 2 IEC 61850 COMMUNICATION SERVICES... 2 APPLICATION

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

VXLAN Overview: Cisco Nexus 9000 Series Switches

VXLAN Overview: Cisco Nexus 9000 Series Switches White Paper VXLAN Overview: Cisco Nexus 9000 Series Switches What You Will Learn Traditional network segmentation has been provided by VLANs that are standardized under the IEEE 802.1Q group. VLANs provide

More information

Network Control and Signalling

Network Control and Signalling Network Control and Signalling 1. Introduction 2. Fundamentals and design principles 3. Network architecture and topology 4. Network control and signalling 5. Network components 5.1 links 5.2 switches

More information

MIDTERM EXAMINATION #2 OPERATING SYSTEM CONCEPTS U N I V E R S I T Y O F W I N D S O R S C H O O L O F C O M P U T E R S C I E N C E

MIDTERM EXAMINATION #2 OPERATING SYSTEM CONCEPTS U N I V E R S I T Y O F W I N D S O R S C H O O L O F C O M P U T E R S C I E N C E MIDTERM EXAMINATION #2 OPERATING SYSTEM CONCEPTS 03-60-367-01 U N I V E R S I T Y O F W I N D S O R S C H O O L O F C O M P U T E R S C I E N C E Intersession 2008 Last Name: First Name: Student ID: PLEASE

More information

W H I T E P A P E R : O P E N. V P N C L O U D. Implementing A Secure OpenVPN Cloud

W H I T E P A P E R : O P E N. V P N C L O U D. Implementing A Secure OpenVPN Cloud W H I T E P A P E R : O P E N. V P N C L O U D Implementing A Secure OpenVPN Cloud Platform White Paper: OpenVPN Cloud Platform Implementing OpenVPN Cloud Platform Content Introduction... 3 The Problems...

More information

Frequently asked questions from the previous class survey

Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED COORDINATION/MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University L22.1 Frequently asked questions from the previous

More information

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions Chapter 1: Solving Integration Problems Using Patterns 2 Introduction The Need for Integration Integration Challenges

More information

Distributed Objects and Remote Invocation. Programming Models for Distributed Applications

Distributed Objects and Remote Invocation. Programming Models for Distributed Applications Distributed Objects and Remote Invocation Programming Models for Distributed Applications Extending Conventional Techniques The remote procedure call model is an extension of the conventional procedure

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

Client Server & Distributed System. A Basic Introduction

Client Server & Distributed System. A Basic Introduction Client Server & Distributed System A Basic Introduction 1 Client Server Architecture A network architecture in which each computer or process on the network is either a client or a server. Source: http://webopedia.lycos.com

More information

CS454/654 Midterm Exam Fall 2004

CS454/654 Midterm Exam Fall 2004 CS454/654 Midterm Exam Fall 2004 (3 November 2004) Question 1: Distributed System Models (18 pts) (a) [4 pts] Explain two benefits of middleware to distributed system programmers, providing an example

More information

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED COORDINATION/MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University

More information

FLEXIBLE QOS-MANAGED STATUS DISSEMINATION MIDDLEWARE FRAMEWORK FOR THE ELECTRIC POWER GRID

FLEXIBLE QOS-MANAGED STATUS DISSEMINATION MIDDLEWARE FRAMEWORK FOR THE ELECTRIC POWER GRID FLEXIBLE QOS-MANAGED STATUS DISSEMINATION MIDDLEWARE FRAMEWORK FOR THE ELECTRIC POWER GRID By KJELL HARALD GJERMUNDRØD A dissertation submitted in partial fulfillment of the requirements for the degree

More information

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies.

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies. Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline

More information

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio) Introduction to Distributed Systems INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio) August 28, 2018 Outline Definition of a distributed system Goals of a distributed system Implications of distributed

More information

RATE-BASED FAILURE DETECTION FOR CRITICAL-INFRASTRUCTURE SENSOR NETWORKS BRETT EMERY TRABUN JOHNSON

RATE-BASED FAILURE DETECTION FOR CRITICAL-INFRASTRUCTURE SENSOR NETWORKS BRETT EMERY TRABUN JOHNSON RATE-BASED FAILURE DETECTION FOR CRITICAL-INFRASTRUCTURE SENSOR NETWORKS By BRETT EMERY TRABUN JOHNSON A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE

More information

Network Working Group Request for Comments: 1679 Category: Informational K. O Donoghue NSWC-DD August 1994

Network Working Group Request for Comments: 1679 Category: Informational K. O Donoghue NSWC-DD August 1994 Network Working Group Request for Comments: 1679 Category: Informational D. Green P. Irey D. Marlow K. O Donoghue NSWC-DD August 1994 HPN Working Group Input to the IPng Requirements Solicitation Status

More information

PLEASE READ CAREFULLY BEFORE YOU START

PLEASE READ CAREFULLY BEFORE YOU START Page 1 of 20 MIDTERM EXAMINATION #1 - B COMPUTER NETWORKS : 03-60-367-01 U N I V E R S I T Y O F W I N D S O R S C H O O L O F C O M P U T E R S C I E N C E Fall 2008-75 minutes This examination document

More information

PLEASE READ CAREFULLY BEFORE YOU START

PLEASE READ CAREFULLY BEFORE YOU START Page 1 of 20 MIDTERM EXAMINATION #1 - A COMPUTER NETWORKS : 03-60-367-01 U N I V E R S I T Y O F W I N D S O R S C H O O L O F C O M P U T E R S C I E N C E Fall 2008-75 minutes This examination document

More information

UNIT IV -- TRANSPORT LAYER

UNIT IV -- TRANSPORT LAYER UNIT IV -- TRANSPORT LAYER TABLE OF CONTENTS 4.1. Transport layer. 02 4.2. Reliable delivery service. 03 4.3. Congestion control. 05 4.4. Connection establishment.. 07 4.5. Flow control 09 4.6. Transmission

More information

COMMUNICATION PROTOCOLS

COMMUNICATION PROTOCOLS COMMUNICATION PROTOCOLS Index Chapter 1. Introduction Chapter 2. Software components message exchange JMS and Tibco Rendezvous Chapter 3. Communication over the Internet Simple Object Access Protocol (SOAP)

More information

WSN Routing Protocols

WSN Routing Protocols WSN Routing Protocols 1 Routing Challenges and Design Issues in WSNs 2 Overview The design of routing protocols in WSNs is influenced by many challenging factors. These factors must be overcome before

More information

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON. DISTRIBUTED SYSTEMS 121r itac itple TAYAdiets Second Edition Andrew S. Tanenbaum Maarten Van Steen Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON Prentice Hall Upper Saddle River, NJ 07458 CONTENTS

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

3. Evaluation of Selected Tree and Mesh based Routing Protocols

3. Evaluation of Selected Tree and Mesh based Routing Protocols 33 3. Evaluation of Selected Tree and Mesh based Routing Protocols 3.1 Introduction Construction of best possible multicast trees and maintaining the group connections in sequence is challenging even in

More information

DeviceNet - CIP on CAN Technology

DeviceNet - CIP on CAN Technology The CIP Advantage Technology Overview Series DeviceNet - CIP on CAN Technology DeviceNet has been solving manufacturing automation applications since the mid-1990's, and today boasts an installed base

More information

A Data-Centric Approach for Modular Assurance Abstract. Keywords: 1 Introduction

A Data-Centric Approach for Modular Assurance Abstract. Keywords: 1 Introduction A Data-Centric Approach for Modular Assurance Gabriela F. Ciocarlie, Heidi Schubert and Rose Wahlin Real-Time Innovations, Inc. {gabriela, heidi, rose}@rti.com Abstract. A mixed-criticality system is one

More information

Last Class: RPCs and RMI. Today: Communication Issues

Last Class: RPCs and RMI. Today: Communication Issues Last Class: RPCs and RMI Case Study: Sun RPC Lightweight RPCs Remote Method Invocation (RMI) Design issues Lecture 9, page 1 Today: Communication Issues Message-oriented communication Persistence and synchronicity

More information

PLEASE READ CAREFULLY BEFORE YOU START

PLEASE READ CAREFULLY BEFORE YOU START MIDTERM EXAMINATION #2 NETWORKING CONCEPTS 03-60-367-01 U N I V E R S I T Y O F W I N D S O R - S c h o o l o f C o m p u t e r S c i e n c e Fall 2011 Question Paper NOTE: Students may take this question

More information

CMPE150 Midterm Solutions

CMPE150 Midterm Solutions CMPE150 Midterm Solutions Question 1 Packet switching and circuit switching: (a) Is the Internet a packet switching or circuit switching network? Justify your answer. The Internet is a packet switching

More information

Request for Comments: 1787 T.J. Watson Research Center, IBM Corp. Category: Informational April 1995

Request for Comments: 1787 T.J. Watson Research Center, IBM Corp. Category: Informational April 1995 Network Working Group Y. Rekhter Request for Comments: 1787 T.J. Watson Research Center, IBM Corp. Category: Informational April 1995 Status of this Memo Routing in a Multi-provider Internet This memo

More information

Introduction to Protocols

Introduction to Protocols Chapter 6 Introduction to Protocols 1 Chapter 6 Introduction to Protocols What is a Network Protocol? A protocol is a set of rules that governs the communications between computers on a network. These

More information

Review problems (for no credit): Transport and Network Layer

Review problems (for no credit): Transport and Network Layer Review problems (for no credit): Transport and Network Layer V. Arun CS 653, Fall 2018 09/06/18 Transport layer 1. Protocol multiplexing: (a) If a web server has 100 open connections, how many sockets

More information

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ Networking for Data Acquisition Systems Fabrice Le Goff - 14/02/2018 - ISOTDAQ Outline Generalities The OSI Model Ethernet and Local Area Networks IP and Routing TCP, UDP and Transport Efficiency Networking

More information

Communication. Distributed Systems Santa Clara University 2016

Communication. Distributed Systems Santa Clara University 2016 Communication Distributed Systems Santa Clara University 2016 Protocol Stack Each layer has its own protocol Can make changes at one layer without changing layers above or below Use well defined interfaces

More information

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015 Distributed Systems Pre-Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2015 October 2, 2015 CS 417 - Paul Krzyzanowski 1 Selected Questions From Past Exams October 2, 2015 CS 417 - Paul Krzyzanowski

More information

Networking and Internetworking 1

Networking and Internetworking 1 Networking and Internetworking 1 Today l Networks and distributed systems l Internet architecture xkcd Networking issues for distributed systems Early networks were designed to meet relatively simple requirements

More information

Chapter 2 Distributed Information Systems Architecture

Chapter 2 Distributed Information Systems Architecture Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline

More information

Homework #2 Nathan Balon CIS 578 October 31, 2004

Homework #2 Nathan Balon CIS 578 October 31, 2004 Homework #2 Nathan Balon CIS 578 October 31, 2004 1 Answer the following questions about the snapshot algorithm: A) What is it used for? It used for capturing the global state of a distributed system.

More information

Appendix A - Glossary(of OO software term s)

Appendix A - Glossary(of OO software term s) Appendix A - Glossary(of OO software term s) Abstract Class A class that does not supply an implementation for its entire interface, and so consequently, cannot be instantiated. ActiveX Microsoft s component

More information

OPTIMIZING MOBILITY MANAGEMENT IN FUTURE IPv6 MOBILE NETWORKS

OPTIMIZING MOBILITY MANAGEMENT IN FUTURE IPv6 MOBILE NETWORKS OPTIMIZING MOBILITY MANAGEMENT IN FUTURE IPv6 MOBILE NETWORKS Sandro Grech Nokia Networks (Networks Systems Research) Supervisor: Prof. Raimo Kantola 1 SANDRO GRECH - OPTIMIZING MOBILITY MANAGEMENT IN

More information

Middleware for Embedded Adaptive Dependability (MEAD)

Middleware for Embedded Adaptive Dependability (MEAD) Middleware for Embedded Adaptive Dependability (MEAD) Real-Time Fault-Tolerant Middleware Support Priya Narasimhan Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA 15213-3890

More information

Network Connectivity and Mobility

Network Connectivity and Mobility Network Connectivity and Mobility BSAD 141 Dave Novak Topics Covered Lecture is structured based on the five elements of creating a connected world from the text book (with additional content) 1. Network

More information

Introduction to Internetworking

Introduction to Internetworking CHAPTER Introduction to Internetworking Introduction This chapter explains basic internetworking concepts. The information presented here helps readers who are new to internetworking comprehend the technical

More information

CS 268: Internet Architecture & E2E Arguments. Today s Agenda. Scott Shenker and Ion Stoica (Fall, 2010) Design goals.

CS 268: Internet Architecture & E2E Arguments. Today s Agenda. Scott Shenker and Ion Stoica (Fall, 2010) Design goals. CS 268: Internet Architecture & E2E Arguments Scott Shenker and Ion Stoica (Fall, 2010) 1 Today s Agenda Design goals Layering (review) End-to-end arguments (review) 2 1 Internet Design Goals Goals 0 Connect

More information

SOFTWARE ENGINEERING DECEMBER. Q2a. What are the key challenges being faced by software engineering?

SOFTWARE ENGINEERING DECEMBER. Q2a. What are the key challenges being faced by software engineering? Q2a. What are the key challenges being faced by software engineering? Ans 2a. The key challenges facing software engineering are: 1. Coping with legacy systems, coping with increasing diversity and coping

More information

Switched Network Latency Problems Solved

Switched Network Latency Problems Solved 1 Switched Network Latency Problems Solved A Lightfleet Whitepaper by the Lightfleet Technical Staff Overview The biggest limiter to network performance is the control plane the array of processors and

More information

Introduction to Cisco ASR 9000 Series Network Virtualization Technology

Introduction to Cisco ASR 9000 Series Network Virtualization Technology White Paper Introduction to Cisco ASR 9000 Series Network Virtualization Technology What You Will Learn Service providers worldwide face high customer expectations along with growing demand for network

More information

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Java Message Service (JMS) is a standardized messaging interface that has become a pervasive part of the IT landscape

More information

LCCI (Large-scale Complex Critical Infrastructures)

LCCI (Large-scale Complex Critical Infrastructures) LCCI (Large-scale Complex Critical Infrastructures) 1 LCCIs are Internet-scale constellations of heterogeneous systems glued together into a federated and open system by a data distribution middleware.

More information

Distributed Systems Inter-Process Communication (IPC) in distributed systems

Distributed Systems Inter-Process Communication (IPC) in distributed systems Distributed Systems Inter-Process Communication (IPC) in distributed systems Mathieu Delalandre University of Tours, Tours city, France mathieu.delalandre@univ-tours.fr 1 Inter-Process Communication in

More information

Introduction to Mobile Ad hoc Networks (MANETs)

Introduction to Mobile Ad hoc Networks (MANETs) Introduction to Mobile Ad hoc Networks (MANETs) 1 Overview of Ad hoc Network Communication between various devices makes it possible to provide unique and innovative services. Although this inter-device

More information

Computer Networks. Sándor Laki ELTE-Ericsson Communication Networks Laboratory

Computer Networks. Sándor Laki ELTE-Ericsson Communication Networks Laboratory Computer Networks Sándor Laki ELTE-Ericsson Communication Networks Laboratory ELTE FI Department Of Information Systems lakis@elte.hu http://lakis.web.elte.hu Based on the slides of Laurent Vanbever. Further

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved.

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Architectural Styles Software Architecture Lecture 5 Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Object-Oriented Style Components are objects Data and associated

More information

USING ARTIFACTORY TO MANAGE BINARIES ACROSS MULTI-SITE TOPOLOGIES

USING ARTIFACTORY TO MANAGE BINARIES ACROSS MULTI-SITE TOPOLOGIES USING ARTIFACTORY TO MANAGE BINARIES ACROSS MULTI-SITE TOPOLOGIES White Paper June 2016 www.jfrog.com INTRODUCTION Distributed software development has become commonplace, especially in large enterprises

More information

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization May 2014 Prepared by: Zeus Kerravala The Top Five Reasons to Deploy Software-Defined Networks and Network Functions

More information

Internet Management Overview

Internet Management Overview Internet Management Overview Based on the Manager-Agent Model Initially SNMPv1 (1990), SNMPv2 1996 Managed Objects similar to OSI attributes, specified through ASN.1 Macros the SNMP Structure of Management

More information

Network protocols and. network systems INTRODUCTION CHAPTER

Network protocols and. network systems INTRODUCTION CHAPTER CHAPTER Network protocols and 2 network systems INTRODUCTION The technical area of telecommunications and networking is a mature area of engineering that has experienced significant contributions for more

More information

Layered Architecture

Layered Architecture 1 Layered Architecture Required reading: Kurose 1.7 CSE 4213, Fall 2006 Instructor: N. Vlajic Protocols and Standards 2 Entity any device capable of sending and receiving information over the Internet

More information

Middleware. Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004

Middleware. Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004 Middleware Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004 Outline Web Services Goals Where do they come from? Understanding middleware Middleware as infrastructure Communication

More information

A Framework for Optimizing IP over Ethernet Naming System

A Framework for Optimizing IP over Ethernet Naming System www.ijcsi.org 72 A Framework for Optimizing IP over Ethernet Naming System Waleed Kh. Alzubaidi 1, Dr. Longzheng Cai 2 and Shaymaa A. Alyawer 3 1 Information Technology Department University of Tun Abdul

More information

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III and VI Section : CSE- 1 & 2 Subject Code : CS6601 Subject Name : DISTRIBUTED

More information

WAN-DDS A wide area data distribution capability

WAN-DDS A wide area data distribution capability 1 A wide area data distribution capability Piet Griffioen, Thales Division Naval - Above Water Systems, Netherlands Abstract- The publish-subscribe paradigm has shown many qualities to efficiently implement

More information

Chapter 10: Peer-to-Peer Systems

Chapter 10: Peer-to-Peer Systems Chapter 10: Peer-to-Peer Systems From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, Addison-Wesley 2005 Introduction To enable the sharing of data and resources

More information

Chapter 4 Communication

Chapter 4 Communication DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 4 Communication Layered Protocols (1) Figure 4-1. Layers, interfaces, and protocols in the OSI

More information

Connecting ESRI to Anything: EAI Solutions

Connecting ESRI to Anything: EAI Solutions Connecting ESRI to Anything: EAI Solutions Frank Weiss P.E., ESRI User s Conference 2002 Agenda Introduction What is EAI? Industry trends Key integration issues Point-to-point interfaces vs. Middleware

More information

SRIJAN MANANDHAR MQTT BASED COMMUNICATION IN IOT. Master of Science thesis

SRIJAN MANANDHAR MQTT BASED COMMUNICATION IN IOT. Master of Science thesis SRIJAN MANANDHAR MQTT BASED COMMUNICATION IN IOT Master of Science thesis Examiner: Prof. Kari Systä Examiner and topic approved by the Faculty Council of the Faculty of Department of Pervasive Systems

More information