An Orthogonal and Fault-Tolerant Subsystem for High-Precision Clock Synchronization in CAN Networks *

Size: px
Start display at page:

Download "An Orthogonal and Fault-Tolerant Subsystem for High-Precision Clock Synchronization in CAN Networks *"

Transcription

1 An Orthogonal and Fault-Tolerant Subsystem for High-Precision Clock Synchronization in Networks * GUILLERMO RODRÍGUEZ-NAVAS and JULIÁN PROENZA Departament de Matemàtiques i Informàtica Universitat de les Illes Balears Ed. Anselm Turmeda, Campus UIB, Palma de Mallorca SPAIN Abstract: - Although the Controller Area Network () protocol is increasingly used for real-time critical applications, its original specification does not provide a clock synchronization service. In this paper we introduce the architecture of a clock subsystem that provides any system with a clock synchronized with high precision. The main advantage of our subsystem is that, unlike what the previous solutions do, it provides this service without replacing the circuitry nor importantly changing the software of the nodes. For this reason, we consider our subsystem as orthogonal to the rest of the network. Another advantage of our clock subsystem is the presence of specific fault tolerance mechanisms, which improve those existent in previous solutions. As a result, our subsystem is able to tolerate its own faults without affecting the nodes of the system. Key-Words: - Distributed embedded systems, Real time, Clock synchronization, Controller Area Network, Fault tolerance 1 Introduction The Controller Area Network () protocol [5] is a serial bus that is being used in a wide range of applications, including automotive and industrial automation, because of its reliability, real-time performance [9], and low cost. A network consists of a set of nodes that interchange messages following the protocol in order to work cooperatively. As Fig. 1 shows, the nodes of a network are constituted of three basic elements: a processor, typically a microcontroller, which executes the application software; a controller, which implements most of the protocol; and a, which simply adapts the transmission and reception signals to the communication medium. An increasing number of networks require their processors to have a synchronized clock. However, as the protocol does not include such service, this has to be provided by implementing a synchronization algorithm at the application layer. Those synchronization algorithms are typically implemented in the software executed by the processor [3, 7, 10], though this kind of implementation has some disadvantages. First, it requires important modifications in the software of the processor, which have an inherent cost and complexity. Second, since the interface between the processor and the controller has not been standardized yet, this software-implemented service of clock synchronization is not completely independent of the controller. Then, if the controller was replaced, the software should be importantly modified again. And finally, as the synchronization information is processed by software in the processor and not at the hardware level, then there are significant latencies that impede to achieve a high precision in the clock synchronization. node controller -bus node controller Figure 1. Architecture of a network In addition to software implementations, the hardware implementation of the Time-Triggered protocol (TT) [2, 4] is argued for being a solution to high-precision clock synchronization in * This work has been supported by the spanish MCYT grant DPI C03-02, which is partially funded by the European Union FEDER program.

2 networks. TT is a higher-layer extension of the protocol that operates the network in a time-triggered mode and, due to this, incorporates a service for clock synchronization. Nevertheless, the hardware implementation of the TT protocol also has some problems that discourage its use in many networks. First, it requires the replacement of the standard controllers by TT controllers, which implies that the software of the processor has to be importantly modified. And second, the clock synchronization in TT is only provided when the network operates in time-triggered mode. Therefore, this service is not compatible with most of the software already developed for standard networks, which assumes event-triggered operation mode. From the above discussion, it can be concluded that all the current solutions for clock synchronization in networks, both hardware and software, have important disadvantages. The work presented in this paper is aimed at overcoming this by designing a subsystem that provides a clock service of high precision that is independent of the used controller, and that implies only minor changes in the software of the processor. The architecture of this subsystem consists of a set of additional hardware modules, which we have named clock units. As depicted in Fig. 2, each one of these clock units is attached to a different node of the network. The function of the clock unit is to provide the processor of its node with a clock that is transparently synchronized with the clocks of the other clock units of the network. This clock is kept in a register that is mapped into the memory of the processor. Then, the software of the processor only has to be slightly modified in order to be able to read the value of this synchronized clock. In addition, the clock units communicate independently of their nodes controllers because they incorporate their own circuitry for sending and receiving messages. Moreover, they are connected to the network through their own s (Fig. 2). Due to this orthogonality to the rest of the network, the entire set of clock units with their corresponding s can be seen as an independent subsystem. This subsystem will be called clock subsystem hereafter. Besides its orthogonality, another important property of the clock subsystem we have defined is its fault-tolerant behaviour. This property is achieved by implementing some mechanisms that will be described later on. In the rest of the paper, we will present the complete architecture of our clock subsystem. In section 2, the basic features of the clock unit will be explained. Section 3 will be devoted to describing the mechanisms that have been included in the clock units in order to achieve fault-tolerance. In the last section, the work presented in this paper will be summarized. Clock unit node -bus controller Figure 2. A node that incorporates the circuitry of our clock subsystem 2 Basic features of the clock unit As indicated above, the function of each clock unit is to provide its node s processor with a transparently synchronized clock. In order to simplify the hardware design of the clock unit, all the functions that it performs have been grouped into three modules. These modules are named global clock register, synchronization module and module (Fig. 3). The function of each module is explained next. The global clock register contains the value of the synchronized clock. This register is mapped into the memory of the processor. In this way, the application software is able to easily read it. The synchronization module is aimed at maintaining the global clock register synchronized with the global clock registers of the other clock units. To perform that function, this module incorporates an algorithm for clock synchronization. Although any algorithm could have been incorporated to this module, we decided to incorporate an algorithm based on the one defined in the specification of the TT protocol, because of its high precision and efficiency. However, we have included some modifications to this algorithm that have improved the fault-tolerance of the synchronization. At this point, only the basics of the synchronization algorithm used by TT [2, 4] are described. In section 3, the modifications that we have made in this algorithm will be further explained. The algorithm for clock synchronization defined by TT is based on a centralized scheme of synchronization, since one of the nodes (the time master) is assumed to have a correct view of the time, and the rest of the nodes (the slaves) simply accept its view. The mechanism for spreading the time view of

3 the time master works as follows. Each node (including the time master) takes a sample of its clock at the sample point [5] of the SOF bit of any message. This sampling is done almost simultaneously by all the nodes, due to the in-bit response of the protocol. Thanks to this simultaneity, each slave can compare its sample to the one of the time master in order to know whether they are synchronized or not. To allow this comparison, the time master periodically sends, within the data field of a reference message, the sample that it has just taken in the SOF bit of the reference message itself. Thus, all the slaves can update their clocks in order to synchronize to the time master s clock and, moreover, they can calculate their drift with respect to the time master and correct it. As remarked in [2], this algorithm achieves a precision in the order of one bit time (e.g. in the order of 1 µsec when works at 1Mbps). As TT operates in a time-triggered mode, the automatic retransmission of erroneous messages is disabled in order to prevent a retransmitted message from interfering the time slot of other nodes. However, the retransmission of erroneous messages is a standard feature of the protocol that allows the nodes to recover from errors in the channel. Therefore, and in order to make our solution compatible with any network, the synchronization algorithm we have included in our clock module does assume that the erroneous messages are automatically retransmitted. Since this difference fundamentally concerns the fault tolerance of the synchronization, it will be further discussed in section 3. Global clock register Synchronization module module Figure 3. Structure of a clock unit The module includes all the necessary circuitry to send and receive messages. This makes each clock unit independent of its node s controller. A standard core that is, the standard circuitry that implements the basic features of a controller cannot be used instead of our module. This is due to the fact that the algorithm for clock synchronization incorporated in the synchronization module requires some services that a standard core does not provide. Specifically, two additional services are required: one to indicate the sampling-point of the first bit of any message, and another to allow the time master to write in the data field of the reference message while it is being transmitted. It is important to remark that, even though those new services are required, the circuitry of our module has lower complexity than the circuitry of a standard core. This is thanks to the fact that the clock units only interchange a single message (the reference message), and therefore the circuitry for managing reception and transmission buffers can be importantly simplified. 3 Fault-tolerant synchronization As explained in section 2, our clock subsystem uses a centralized scheme of synchronization. In principle, there is only a clock unit that performs the function of time master and the rest of clock units are slaves. However, a disadvantage of this scheme is that the time master becomes a single point of failure of the clock subsystem, since whenever it has a fault then the entire process of synchronization does not work. To avoid this situation, we need to provide the clock subsystem with some mechanisms that allow the faults of the time master to be tolerated. Since the time master is also a single point of failure in the synchronization algorithm of TT, we initially considered including the mechanisms that this protocol provides to tolerate faults of the time master. However, an in-depth analysis of such mechanisms showed that they do not completely solve the issue of fault tolerance and, hence, they could not be directly used in our clock units. As a consequence, we decided to design some new mechanisms, inspired in the ones used by TT, that actually provide our clock units with a suitable fault tolerance. In this section, we will first present and analyze the mechanisms for fault tolerance defined by TT. After that, we will introduce the new mechanisms we have designed. 3.1 Tolerance to faults of the time master In order to tolerate faults of the time master, TT declares a number of nodes as replicated time masters (called spare time masters). The function of these replicated time masters is to substitute the main time master whenever it fails. The spare time masters perform the detection of errors of the time master as follows. They know when the main time master has to send its reference message. And, as this message cannot be interfered by others because of the timetriggered operation of TT, then they can also determine when it should be received. Thus, they consider an error of the main time master whenever

4 the reference message is not received on time. Note that, since TT does not support the automatic retransmission of erroneous messages, this approach allows a channel error that happens during the transmission of the reference message to be incorrectly taken as a failure of the main time master. Once a spare time master detects an error of the main time master (i.e. an omission of the reference message), it performs an error-recovery mechanism, which simply consists in transmitting its own reference message. In that way, the network can continue its normal operation. Since all the spare time masters are able to detect the omissions of the reference message, more than one time master may simultaneously try to send the reference message. However, in TT each time master sends a message with a different identifier. Then, the arbitration mechanism [5] solves this conflict by causing the time master with the highest priority identifier to succeed in sending the reference message. After that, the spare time master that has won the arbitration becomes the main time master. In order to tolerate faults of the time master, our clock subsystem includes some of the mechanisms used by TT. First, we have included time master redundancy, since a number of clock units are declared as spare time masters. And, second, we have included the error-recovery mechanism, since the spare time masters are also in charge of sending the reference message when the main time master fails. Nevertheless, we have realized that the errordetection mechanism of TT has two important problems, and that, due to them, it cannot be incorporated without modification to our clock subsystem. The first problem of the error-detection in TT is that the time masters are assumed to fail only by not sending the reference message. In other words, an omission failure semantics [1] is assumed for the time masters. However, this assumption is not substantiated by the architecture of the TT nodes, since they are not provided with any mechanism that prevents the time masters from having other kind of failures, like performance failures or Byzantine failures (e.g. a faulty time master that takes the identity of the main time master and sends an absurd value of time). The second problem of the error-detection is related to the fact that only the errors of the main time master can be detected. Due to this, the actual availability of the spare time masters is unknown to the rest of the network. To solve the problems presented above, we have incorporated new mechanisms to our clock subsystem. First, we have designed a hardware structure and a distributed protocol that actually substantiates the assumption of omission failure semantics not only in the time masters, but also in every clock unit. These mechanisms will be described in section 3.3. In addition, we have defined a new error-detection mechanism, which allows all the clock units to have a consistent view of the actual state of the entire set of time masters. This mechanism is presented next. 3.2 Detecting errors of the time masters In our clock subsystem, only the errors of the main time master are theoretically detected. However, in practice we make all the time masters play the role of main time master one after the other, following a round-robin scheme. Therefore, the error of any time master can be detected by the entire set of clock units with a relatively short latency. The detection of errors is performed as follows. The main time master has to send the reference message at a given instant, which do all the clock units know. This reference message has the highest priority in the network. Separately, the spare time masters have to send their reference messages a little time interval after that instant. The length of this interval depends on the precision of the clock, as it should be long enough to guarantee that a spare time master cannot schedule the transmission of its reference message before the instant in which the main time master has to. If this condition is guaranteed, whenever the reference message of a spare time master is received instead of the one of the main time master, then it can be stated that the latter has not sent its reference message. Note that, due to the omission failure semantics of the clock units (and hence of the time masters), this mechanism allows the clock units to become aware of all the faults of the main time master, since any fault will manifest like an omission of the reference message. Of course, when the reference message of the main time master is received, all the spare time masters abort the transmission of their reference messages in order to avoid the transmission of useless messages. The consistency of this error-detection mechanism relies on the Atomic Broadcast property provided by the protocol [5], which guarantees that a message is received by all the nodes of the network or it is not received by any. This condition is enforced in our case by the fact that the clock units cannot fail in a way that implies a violation of the protocol, since they have an omission failure semantics. Although some contributions have probed that the protocol does not actually provide Atomic Broadcast in certain scenarios, we do consider that it is provided, as solutions to this problem have been already suggested [6, 8].

5 Once our system has been provided with this mechanism for consistently detecting errors in all the time masters, then a consistent membership service can be easily designed. The membership information kept by every clock unit is a penalty count of the errors that each time master has shown. Whenever an error of one time master is detected, then its penalty count is increased by a given amount. And, whenever a time master does not fail and sends its reference message, then its penalty count is decreased by a lower amount. If the penalty count of a time master reaches a certain threshold, then it is consistently considered as having a permanent failure. 3.3 Omission failure semantics In order to provide our clock units with an omission failure semantics we have designed two mechanisms. The first one is a hardware structure that detects any error caused by an internal fault in the clock unit and that, moreover, prevents those errors from spreading to the rest of the system by disconnecting the faulty clock unit from the rest of the system. The second one is a distributed protocol that allows the clock units to recover from an internal error. The hardware structure is presented next. After that, the distributed protocol will be described. Note that without this second mechanism, the clock unit would have crash failure semantics, since a single fault would lead the clock unit to be indefinitely disconnected. The new hardware structure consists of a duplicated clock unit with a comparison module that supervises its behaviour. This last module performs two different comparisons, a first one at the high level, between the global clock register kept by each replica of the clock unit, and a second one at the low level, between the frames sent by each replica of the module to the. Thus, whenever one of the replicas of the clock unit has a fault then it manifests as an error in some of these comparisons and is detected. Moreover, when such an error is detected, the comparison module prevents the duplicated clock unit from transmitting any data to the network by disabling its. To achieve an omission failure semantics, the duplicated clock unit that has detected an error, and therefore is disabled, must have the opportunity to recover. To do this, the duplicated clock unit is provided with an internal recovery mechanism. This recovery mechanism reestablishes a correct internal state in the duplicated clock units, when it is possible, and after that, it enables the again. However, this recovery mechanism cannot guarantee that the membership information kept by the duplicated clock unit is correct (since it could have been corrupted by the fault) nor consistent with the actual state of the system (since some reference messages could have been lost during the recovery). Therefore, we have designed an additional mechanism to obtain a correct and consistent membership information from the other time masters. Initially, we have considered two options for implementing this mechanism. Both are introduced next. The first option consisted in using a mechanism for requesting information from other nodes which is naturally provided by the protocol. This mechanism is the Remote frame [5]. However, although using this mechanism allows the time masters to recover in a short time, it has been rejected because it introduces a sporadic message, whose transmission and reception would significantly increase the complexity of the clock unit. In addition, this mechanism would also complicate the scheduling of messages in the system. The second option is the one incorporated in our subsystem. It consists in sending the membership information, together with the time value, within the data field of every reference message. The main advantage of this mechanism is its simplicity, as the membership information is sent in a periodical message that the time masters have already to send, and therefore it does not require important changes in the clock units. Thus, after reestablishing its internal state, a clock unit in the process of recovering only needs to wait until the reception of the next reference message. Once this message is received, the consistency of the membership information is guaranteed again. Thanks to this consistency, a justrecovered clock unit that is time master can know in which state the other clock units consider it is. Thus, if it is not considered in a permanent failure, then it joins the round robin, so that all the clock units will detect it has recovered as soon as it becomes the main time master and sends the reference message. On the contrary, if the rest of time masters consider it is in a permanent failure, then it does not join the round robin and does no longer send its reference message. Since a clock unit in the process to recover does not have a consistent membership information until a reference message is received, then it is not able to know the identity of the main time master. Therefore, if the main time master would fail and the reference message of a spare time master would be received, then this clock unit would not be able to increase the corresponding penalty count. In order to avoid this situation, the spare time masters send the membership information with the penalty count of the main time master already increased. Note that the correctness of this mechanism relies on the fact that the reference message sent by a spare time master only can be

6 received if the main time master has actually failed, as we explained in section 3.2. Sending the membership information within the reference message limits the number of time masters allowed in our clock subsystem. We have decided to use eight time masters, and codify the penalty count of each one of them with three bits. In this way, the membership information occupies three bytes in the data field of the reference message. The other bits can be used to send the time of the main time master and some control information. It is important to remark that whenever a clock unit (either time master or slave) has a failure, the precision of the clock that it provides to the processor of this node cannot be guaranteed until the hardware recovery function has been executed and two reference messages have been received [2, 4]. Thus, at the application level it should be decided whether the processor is able to work during this time with a clock of degraded precision. 4 Summary In this paper the architecture of a subsystem that solves the problem of high-precision clock synchronization in event-triggered networks has been introduced. This architecture consists of a set of additional modules, called clock units, which are attached to the nodes of the original network. Each clock unit is attached to a different node and provides the processor of this node with a clock transparently synchronized to the rest of clock units. Our solution is considered as orthogonal, in opposition to solutions previously suggested, because it is independent of the controller used by the nodes of the network, and because the processors of the nodes can use it without importantly changing the application software. Such orthogonality is desirable because it reduces the implementation cost. Although our architecture is compatible with any synchronization algorithm, we have decided to implement a similar algorithm to the one defined by the TT protocol, because of its high precision and efficiency. This algorithm uses a centralized scheme of synchronization, since a single node, which is called time master, is in charge of maintaining the synchronization. Due to this, some mechanisms that provide tolerance to faults of this time master are required. A significant contribution of the present work is that, as the mechanisms defined by TT to provide this fault tolerance were not properly solved, we have defined new mechanisms that actually provide a suitable tolerance to faults of the time master. As a consequence, our clock subsystem is able to tolerate its own faults without affecting the rest of the nodes and, therefore, it can be added to any network without decreasing the global reliability of the system. Moreover, we have designed a consistent membership service, which guarantees that the actual availability of all the time masters is consistently known by the entire set of clock units. References: [1] F. Cristian, Questions to ask when designing or attempting to understand a fault-tolerant distributed system, Keynote Address in Proc. 3 rd Brazilian Conference on Fault-tolerant Computing, Rio de Janeiro, Brazil, [2] T. Führer, B. Müller, W. Dieterle, F. Hartwich, R. Hugel, M. Walther, Robert Bosch GmbH, Time Triggered Communication on, Proceedings of the 7 th International Conference, Amsterdam, The Netherlands, [3] M. Gergeleit and H. Streich, Implementing a Distributed High-resolution Real-time Clock using the -bus, Proceedings of the 1 st International Conference, Mainz, Germany, [4] F. Hartwich, B. Müller, Th. Führer, R. Hugel, Robert Bosch GmbH, network with Time Triggered Communication, Proceedings of the 7 th International Conference, Amsterdam, The Netherlands, [5] ISO, Road vehicles Controller area network () Part 1: Controller area network data link layer and medium access control, [6] J. Proenza and J. Miro-Julia, Major: a Modification to the Controller Area Network Protocol to achieve Atomic Broadcast, IEEE Int. Workshop on Group Communications and Computations, Taipei, Taiwan, [7] L. Rodrígues, M. Guimarães and J. Rufino, Faulttolerant Clock Synchronization in, Proceedings of the 19 th IEEE Real-time Systems Simposium, Madrid, Spain, [8] J. Rufino, P. Verissimo, G. Arroz, C. Almeida, and L. Rodrígues, Fault-tolerant broadcasts in, Digest of papers, The 28 th IEEE International Symposium on Fault-Tolerant Computing, Munich, Germany, [9] K. Tindell and A. Burns, Guaranteeing Message Latencies on Controller Area Network (), Proceedings of the 1 st International Conference, Mainz, Germany, [10] K. Turski, A global time system for networks, Proceedings of the 1 st International Conference, Mainz, Germany, 1994.

A CAN-Based Architecture for Highly Reliable Communication Systems

A CAN-Based Architecture for Highly Reliable Communication Systems A CAN-Based Architecture for Highly Reliable Communication Systems H. Hilmer Prof. Dr.-Ing. H.-D. Kochs Gerhard-Mercator-Universität Duisburg, Germany E. Dittmar ABB Network Control and Protection, Ladenburg,

More information

CAN Network with Time Triggered Communication

CAN Network with Time Triggered Communication CAN Network with Time Triggered Communication Florian Hartwich Bernd Müller Thomas Führer Robert Hugel Robert Bosch GmbH The communication in the classic CAN network is event triggered; peak loads may

More information

A Fault Management Protocol for TTP/C

A Fault Management Protocol for TTP/C A Fault Management Protocol for TTP/C Juan R. Pimentel Teodoro Sacristan Kettering University Dept. Ingenieria y Arquitecturas Telematicas 1700 W. Third Ave. Polytechnic University of Madrid Flint, Michigan

More information

in Mainz (Germany) Sponsored by Allen Bradley National Semiconductor Philips Semiconductors Organized by

in Mainz (Germany) Sponsored by Allen Bradley National Semiconductor Philips Semiconductors Organized by 1 st international Conference icc 1994 in Mainz (Germany) Sponsored by Allen Bradley National Semiconductor Philips Semiconductors Organized by in Automation (CiA) international users and manufacturers

More information

Communication Networks for the Next-Generation Vehicles

Communication Networks for the Next-Generation Vehicles Communication Networks for the, Ph.D. Electrical and Computer Engg. Dept. Wayne State University Detroit MI 48202 (313) 577-3855, smahmud@eng.wayne.edu January 13, 2005 4 th Annual Winter Workshop U.S.

More information

Fault tolerant TTCAN networks

Fault tolerant TTCAN networks Fault tolerant TTCAN networks B. MŸller, T. FŸhrer, F. Hartwich, R. Hugel, H. Weiler, Robert Bosch GmbH TTCAN is a time triggered layer using the CAN protocol to communicate in a time triggered fashion.

More information

First Study of the Proactive Transmission of Replicated Frames Mechanism over TSN. Inés Álvarez, Drago Čavka, Julián Proenza, Manuel Barranco

First Study of the Proactive Transmission of Replicated Frames Mechanism over TSN. Inés Álvarez, Drago Čavka, Julián Proenza, Manuel Barranco First Study of the Proactive ransmission of Replicated Frames Mechanism over SN Inés Álvarez, Drago Čavka, Julián Proenza, Manuel Barranco 2 Introduction ime-sensitive Networking (SN) ask Group. Developing

More information

IPP-HURRAY! Research Group. Polytechnic Institute of Porto School of Engineering (ISEP-IPP)

IPP-HURRAY! Research Group. Polytechnic Institute of Porto School of Engineering (ISEP-IPP) IPP-HURRAY! Research Group Polytechnic Institute of Porto School of Engineering (ISEP-IPP) An Architecture For Reliable Distributed Computer-Controlled Systems Luís Miguel PINHO Francisco VASQUES (FEUP)

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

Time-Triggered Ethernet

Time-Triggered Ethernet Time-Triggered Ethernet Chapters 42 in the Textbook Professor: HONGWEI ZHANG CSC8260 Winter 2016 Presented By: Priyank Baxi (fr0630) fr0630@wayne.edu Outline History Overview TTEthernet Traffic Classes

More information

in Berlin (Germany) Sponsored by Motorola Semiconductor NEC Electronics (Europe) Siemens Semiconductors Organized by

in Berlin (Germany) Sponsored by Motorola Semiconductor NEC Electronics (Europe) Siemens Semiconductors Organized by 4 th international CAN Conference icc 1997 in Berlin (Germany) Sponsored by Motorola Semiconductor NEC Electronics (Europe) Siemens Semiconductors Organized by CAN in Automation (CiA) international users

More information

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems COMP 212. Revision 2 Othon Michail Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Thomas Nolte, Hans Hansson, and Christer Norström Mälardalen Real-Time Research Centre Department of Computer Engineering

More information

Systems. Roland Kammerer. 10. November Institute of Computer Engineering Vienna University of Technology. Communication Protocols for Embedded

Systems. Roland Kammerer. 10. November Institute of Computer Engineering Vienna University of Technology. Communication Protocols for Embedded Communication Roland Institute of Computer Engineering Vienna University of Technology 10. November 2010 Overview 1. Definition of a protocol 2. Protocol properties 3. Basic Principles 4. system communication

More information

ESCAN An Open Source, High Bandwidth, Event Scheduled Controller Area Network

ESCAN An Open Source, High Bandwidth, Event Scheduled Controller Area Network ESCAN An Open Source, High Bandwidth, Event Scheduled Controller Area Network A. Williams, C. Quigley, R. McLaughlin, Warwick Control Event Scheduled CAN (ESCAN) is an open source, scheduling protocol

More information

Timing in the TTCAN Network

Timing in the TTCAN Network Timing in the Network Florian Hartwich, Bernd Müller, Thomas Führer, Robert Hugel, Robert Bosch GmbH ISO TC22/SC3/WG1/TF6 has standardised (as ISO CD 11898-4) an additional layer to the CAN protocol, Time

More information

Chapter 39: Concepts of Time-Triggered Communication. Wenbo Qiao

Chapter 39: Concepts of Time-Triggered Communication. Wenbo Qiao Chapter 39: Concepts of Time-Triggered Communication Wenbo Qiao Outline Time and Event Triggered Communication Fundamental Services of a Time-Triggered Communication Protocol Clock Synchronization Periodic

More information

System Models for Distributed Systems

System Models for Distributed Systems System Models for Distributed Systems INF5040/9040 Autumn 2015 Lecturer: Amir Taherkordi (ifi/uio) August 31, 2015 Outline 1. Introduction 2. Physical Models 4. Fundamental Models 2 INF5040 1 System Models

More information

An Overview of the Controller Area Network

An Overview of the Controller Area Network An Overview of the Controller Area Network José Rufino ruf@digitais.ist.utl.pt IST - UTL Abstract The Controller Area Network (CAN) is a communication bus for message transaction in small-scale distributed

More information

Fault Tolerance. The Three universe model

Fault Tolerance. The Three universe model Fault Tolerance High performance systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful

More information

Real-Time (Paradigms) (47)

Real-Time (Paradigms) (47) Real-Time (Paradigms) (47) Memory: Memory Access Protocols Tasks competing for exclusive memory access (critical sections, semaphores) become interdependent, a common phenomenon especially in distributed

More information

Intel iapx 432-VLSI building blocks for a fault-tolerant computer

Intel iapx 432-VLSI building blocks for a fault-tolerant computer Intel iapx 432-VLSI building blocks for a fault-tolerant computer by DAVE JOHNSON, DAVE BUDDE, DAVE CARSON, and CRAIG PETERSON Intel Corporation Aloha, Oregon ABSTRACT Early in 1983 two new VLSI components

More information

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Transaction Management A.R. Hurson Computer Science Missouri Science & Technology 1 Distributed Database System Note, this unit will be covered

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer - proposes a formal definition for the timed asynchronous distributed system model - presents measurements of process

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

BOSCH. CAN Specification. Version , Robert Bosch GmbH, Postfach , D Stuttgart

BOSCH. CAN Specification. Version , Robert Bosch GmbH, Postfach , D Stuttgart CAN Specification Version 2.0 1991, Robert Bosch GmbH, Postfach 30 02 40, D-70442 Stuttgart CAN Specification 2.0 page 1 Recital The acceptance and introduction of serial communication to more and more

More information

The House Intelligent Switch Control Network based On CAN bus

The House Intelligent Switch Control Network based On CAN bus The House Intelligent Switch Control Network based On CAN bus A.S.Jagadish Department Electronics and Telecommunication Engineering, Bharath University Abstract The Embedded Technology is now in its prime

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems (ICE 601) Fault Tolerance Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability

More information

Flexray Communication Controller for Intra-Vehicular Communication and Its Realization in FPGA

Flexray Communication Controller for Intra-Vehicular Communication and Its Realization in FPGA 2016 IJSRSET Volume 2 Issue 1 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Flexray Communication Controller for Intra-Vehicular Communication and Its Realization

More information

Time Triggered CAN, Implementations, Development and Testing Tools

Time Triggered CAN, Implementations, Development and Testing Tools Time Triggered CAN, Implementations, Development and Testing Tools Chris Quigley, Ben Pope, James Finney, Richard T. McLaughlin Warwick Control Technologies ABSTRACT The Controller Area Network (CAN) has

More information

in Mainz (Germany) Sponsored by Allen Bradley National Semiconductor Philips Semiconductors Organized by

in Mainz (Germany) Sponsored by Allen Bradley National Semiconductor Philips Semiconductors Organized by 1 st international CAN Conference icc 1994 in Mainz (Germany) Sponsored by Allen Bradley National Semiconductor Philips Semiconductors Organized by CAN in Automation (CiA) international users and manufacturers

More information

Module 8 Fault Tolerance CS655! 8-1!

Module 8 Fault Tolerance CS655! 8-1! Module 8 Fault Tolerance CS655! 8-1! Module 8 - Fault Tolerance CS655! 8-2! Dependability Reliability! A measure of success with which a system conforms to some authoritative specification of its behavior.!

More information

CAN in Automation (CiA) International Users and Manufacturers Group e.v.

CAN in Automation (CiA) International Users and Manufacturers Group e.v. CAN in Automation (CiA) International Users and Manufacturers Group e.v. CAN Application Layer for Industrial Applications CiA/DS201 February 1996 February 1996 1. SCOPE This document contains a description

More information

Can controller implementing features for reliable communication 1

Can controller implementing features for reliable communication 1 Can controller implementing features for reliable communication 1 J.C. Campelo, A. Rubio, F. Rodríguez, J.J. Serrano Dept. of Computer Engineering, Technical University of Valencia (SPAIN) {jcampelo, alicia,

More information

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers 1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.

More information

Distributed Embedded Systems and realtime networks

Distributed Embedded Systems and realtime networks STREAM01 / Mastère SE Distributed Embedded Systems and realtime networks Embedded network TTP Marie-Agnès Peraldi-Frati AOSTE Project UNSA- CNRS-INRIA January 2008 1 Abstract Requirements for TT Systems

More information

An Encapsulated Communication System for Integrated Architectures

An Encapsulated Communication System for Integrated Architectures An Encapsulated Communication System for Integrated Architectures Architectural Support for Temporal Composability Roman Obermaisser Overview Introduction Federated and Integrated Architectures DECOS Architecture

More information

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d) Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy

More information

A Time-Triggered Ethernet (TTE) Switch

A Time-Triggered Ethernet (TTE) Switch A Time-Triggered Ethernet () Switch Klaus Steinhammer Petr Grillinger Astrit Ademaj Hermann Kopetz Vienna University of Technology Real-Time Systems Group Treitlstr. 3/182-1, A-1040 Vienna, Austria E-mail:{klaus,grilling,ademaj,hk}@vmars.tuwien.ac.at

More information

Controller area network

Controller area network Controller area network From Wikipedia, the free encyclopedia (Redirected from Controller area network) Controller area network (CAN or CAN-bus) is a vehicle bus standard designed to allow microcontrollers

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

A Reliable Broadcast System

A Reliable Broadcast System A Reliable Broadcast System Yuchen Dai, Xiayi Huang, Diansan Zhou Department of Computer Sciences and Engineering Santa Clara University December 10 2013 Table of Contents 2 Introduction......3 2.1 Objective...3

More information

Dependable Computer Systems

Dependable Computer Systems Dependable Computer Systems Part 6b: System Aspects Contents Synchronous vs. Asynchronous Systems Consensus Fault-tolerance by self-stabilization Examples Time-Triggered Ethernet (FT Clock Synchronization)

More information

Comparison of CAN Gateway Modules for Automotive and Industrial Control Applications

Comparison of CAN Gateway Modules for Automotive and Industrial Control Applications Comparison of CAN Gateway Modules for Automotive and Industrial Control Applications Jan Taube 1,2, Florian Hartwich 1, Helmut Beikirch 2 1 Robert Bosch GmbH Reutlingen, 2 University of Rostock Bus architectures

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Fault-Tolerant Clock Synchronization in CAN

Fault-Tolerant Clock Synchronization in CAN Fault-Tolerant Clock Synchronization in CAN Luís Rodrigues FCUL æ ler@di.fc.ul.pt Mário Guimarães y IST-UTL Mario.Guimaraes@inesc.pt José Rufino IST-UTL z ruf@digitais.ist.utl.pt Abstract This paper presents

More information

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed

More information

Unavoidable Constraints and Collision Avoidance Techniques in Performance Evaluation of Asynchronous Transmission WDMA Protocols

Unavoidable Constraints and Collision Avoidance Techniques in Performance Evaluation of Asynchronous Transmission WDMA Protocols 1th WEA International Conference on COMMUICATIO, Heraklion, reece, July 3-5, 8 Unavoidable Constraints and Collision Avoidance Techniques in Performance Evaluation of Asynchronous Transmission WDMA Protocols

More information

Mixed-Criticality Systems based on a CAN Router with Support for Fault Isolation and Selective Fault-Tolerance

Mixed-Criticality Systems based on a CAN Router with Support for Fault Isolation and Selective Fault-Tolerance IFAC 2014 Mixed-Criticality Systems based on a Router with Support for Fault Isolation and Selective Fault-Tolerance Roland Kammerer 1, Roman Obermaisser², Mino Sharkhawy 1 1 Vienna University of Technology,

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance

More information

Automotive and highly dependable Networks!

Automotive and highly dependable Networks! Automotive and highly dependable Networks H. Kopetz, TU Wien (see references in the introduction) Excellent surveys: TTP: Hermann Kopetz, Günther Bauer: "The Time-Triggered Architecture" http://www.tttech.com/technology/docs/history/hk_2002-10-tta.pdf

More information

Today. Last Time. Motivation. CAN Bus. More about CAN. What is CAN?

Today. Last Time. Motivation. CAN Bus. More about CAN. What is CAN? Embedded networks Characteristics Requirements Simple embedded LANs Bit banged SPI I2C LIN Ethernet Last Time CAN Bus Intro Low-level stuff Frame types Arbitration Filtering Higher-level protocols Today

More information

Blizzard: A Distributed Queue

Blizzard: A Distributed Queue Blizzard: A Distributed Queue Amit Levy (levya@cs), Daniel Suskin (dsuskin@u), Josh Goodwin (dravir@cs) December 14th 2009 CSE 551 Project Report 1 Motivation Distributed systems have received much attention

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Assignment 12: Commit Protocols and Replication Solution

Assignment 12: Commit Protocols and Replication Solution Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication

More information

Module 8 - Fault Tolerance

Module 8 - Fault Tolerance Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced

More information

Controller Area Network

Controller Area Network Controller Area Network 1 CAN FUNDAMENTALS...3 1.1 USER BENEFITS...3 1.1.1 CAN is low cost...3 1.1.2 CAN is reliable...3 1.1.3 CAN means real-time...3 1.1.4 CAN is flexible...3 1.1.5 CAN means Multicast

More information

Digital communication technology for teaching automatic control: the level control case

Digital communication technology for teaching automatic control: the level control case Digital communication technology for teaching automatic control: the level control case Nicolás H. Beltrán, Manuel A. Duarte-Mermoud and Pablo A. Kremer Department of Electrical Engineering, University

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems. Fault Tolerance. Paul Krzyzanowski Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected

More information

On the interconnection of message passing systems

On the interconnection of message passing systems Information Processing Letters 105 (2008) 249 254 www.elsevier.com/locate/ipl On the interconnection of message passing systems A. Álvarez a,s.arévalo b, V. Cholvi c,, A. Fernández b,e.jiménez a a Polytechnic

More information

MilCAN A. Data Link Layer Specification IHSDB-APP-GEN-D-031. Revision 4

MilCAN A. Data Link Layer Specification IHSDB-APP-GEN-D-031. Revision 4 MilCAN A Data Link Layer Specification IHSDB-APP-GEN-D-031 Revision 4 Cover + viii + 19 pages March 2003 This document may be downloaded from http://www.milcan.org Rev. 4 To request clarification of any

More information

Additional Slides (informative)

Additional Slides (informative) Automation Systems Discrete Event Control Systems and Networked Automation Systems Additional Slides (informative) Application Automotive Networks (LIN, CAN, FlexRay, MOST) Vorlesungstitel Vehicle Bus

More information

Data Link Layer: Overview, operations

Data Link Layer: Overview, operations Data Link Layer: Overview, operations Chapter 3 1 Outlines 1. Data Link Layer Functions. Data Link Services 3. Framing 4. Error Detection/Correction. Flow Control 6. Medium Access 1 1. Data Link Layer

More information

Mixed-Criticality Systems based on a CAN Router with Support for Fault Isolation and Selective Fault-Tolerance

Mixed-Criticality Systems based on a CAN Router with Support for Fault Isolation and Selective Fault-Tolerance Preprints of the 19th World Congress The International Federation of Automatic Control Mixed-Criticality Systems based on a CAN Router with Support for Fault Isolation and Selective Fault-Tolerance Roland

More information

CAN protocol enhancement

CAN protocol enhancement Protocols CAN protocol enhancement This article describes the enhanced CAN protocol called CAN-HG and the features of the IC circuitry from Canis that implement it. CAN-HG has been designed to meet two

More information

CANopen Maritime A New Standard for Highly Dependable Communication Systems

CANopen Maritime A New Standard for Highly Dependable Communication Systems CANopen Maritime A New Standard for Highly Dependable Communication Systems Prof. Dr. K. Etschberger, IXXAT Automation Dipl.-Ing. C. Schlegel, IXXAT Automation Dr. O. Schnelle, MTU Friedrichshafen Bjørnar

More information

Lecture 2. Basics of networking in automotive systems: Network. topologies, communication principles and standardised protocols

Lecture 2. Basics of networking in automotive systems: Network. topologies, communication principles and standardised protocols Lecture 2. Basics of networking in automotive systems: Network topologies, communication principles and standardised protocols Objectives Introduce basic concepts used in building networks for automotive

More information

A Low-Latency DMR Architecture with Efficient Recovering Scheme Exploiting Simultaneously Copiable SRAM

A Low-Latency DMR Architecture with Efficient Recovering Scheme Exploiting Simultaneously Copiable SRAM A Low-Latency DMR Architecture with Efficient Recovering Scheme Exploiting Simultaneously Copiable SRAM Go Matsukawa 1, Yohei Nakata 1, Yuta Kimi 1, Yasuo Sugure 2, Masafumi Shimozawa 3, Shigeru Oho 4,

More information

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

FlexRay International Workshop. Protocol Overview

FlexRay International Workshop. Protocol Overview FlexRay International Workshop 4 th March 2003 Detroit Protocol Overview Dr. Christopher Temple - Motorola FlexRay principles Provide a communication infrastructure for future generation highspeed control

More information

A CAN Protocol for Calibration and Measurement Data Acquisition

A CAN Protocol for Calibration and Measurement Data Acquisition A CAN Protocol for Calibration and Measurement Data Acquisition Rainer Zaiser Vector Informatik GmbH Friolzheimer Strasse 6 70499 Stuttgart,Germany Page 1 &RQWHQWV 1 CONTENTS 2 2 INTRODUCTION 2 2.1 The

More information

ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS

ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS Proceedings of the 13th International Conference on Parallel Processing, Bellaire, Michigan, pp. 32-41, August 1984. ERROR RECOVERY I MULTICOMPUTERS USIG GLOBAL CHECKPOITS Yuval Tamir and Carlo H. Séquin

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

ECS-087: Mobile Computing

ECS-087: Mobile Computing ECS-087: Mobile Computing TCP over wireless TCP and mobility Most of the Slides borrowed from Prof. Sridhar Iyer s lecture IIT Bombay Diwakar Yagyasen 1 Effect of Mobility on Protocol Stack Application:

More information

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13. CSE 120: Principles of Operating Systems Lecture 13 Distributed Systems December 2, 2003 Before We Begin Read Chapters 15, 17 (on Distributed Systems topics) Prof. Joe Pasquale Department of Computer Science

More information

An Efficient Implementation of the SM Agreement Protocol for a Time Triggered Communication System

An Efficient Implementation of the SM Agreement Protocol for a Time Triggered Communication System An Efficient Implementation of the SM Agreement Protocol for a Time Triggered Communication System 2010-01-2320 Published 10/19/2010 Markus Jochim and Thomas M. Forest General Motors Copyright 2010 SAE

More information

System models for distributed systems

System models for distributed systems System models for distributed systems INF5040/9040 autumn 2010 lecturer: Frank Eliassen INF5040 H2010, Frank Eliassen 1 System models Purpose illustrate/describe common properties and design choices for

More information

TU Wien. Shortened by Hermann Härtig The Rationale for Time-Triggered (TT) Ethernet. H Kopetz TU Wien December H. Kopetz 12.

TU Wien. Shortened by Hermann Härtig The Rationale for Time-Triggered (TT) Ethernet. H Kopetz TU Wien December H. Kopetz 12. TU Wien 1 Shortened by Hermann Härtig The Rationale for Time-Triggered (TT) Ethernet H Kopetz TU Wien December 2008 Properties of a Successful Protocol 2 A successful real-time protocol must have the following

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

Amrita Vishwa Vidyapeetham. ES623 Networked Embedded Systems Answer Key

Amrita Vishwa Vidyapeetham. ES623 Networked Embedded Systems Answer Key Time: Two Hours Amrita Vishwa Vidyapeetham M.Tech Second Assessment February 2013 Second Semester Embedded Systems Roll No: ES623 Networked Embedded Systems Answer Key Answer all Questions Maximum: 50

More information

Impact of transmission errors on TCP performance. Outline. Random Errors

Impact of transmission errors on TCP performance. Outline. Random Errors Impact of transmission errors on TCP performance 1 Outline Impact of transmission errors on TCP performance Approaches to improve TCP performance Classification Discussion of selected approaches 2 Random

More information

Multiple Access Protocols

Multiple Access Protocols Multiple Access Protocols Computer Networks Lecture 2 http://goo.gl/pze5o8 Multiple Access to a Shared Channel The medium (or its sub-channel) may be shared by multiple stations (dynamic allocation) just

More information

Reducing SpaceWire Time-code Jitter

Reducing SpaceWire Time-code Jitter Reducing SpaceWire Time-code Jitter Barry M Cook 4Links Limited The Mansion, Bletchley Park, Milton Keynes, MK3 6ZP, UK Email: barry@4links.co.uk INTRODUCTION Standards ISO/IEC 14575[1] and IEEE 1355[2]

More information

Field buses (part 2): time triggered protocols

Field buses (part 2): time triggered protocols Field buses (part 2): time triggered protocols Nico Fritz Universität des Saarlandes Embedded Systems 2002/2003 (c) Daniel Kästner. 1 CAN and LIN LIN CAN Type Arbitration Transfer rate Serial communication

More information

ISO INTERNATIONAL STANDARD. Road vehicles Controller area network (CAN) Part 4: Time-triggered communication

ISO INTERNATIONAL STANDARD. Road vehicles Controller area network (CAN) Part 4: Time-triggered communication INTERNATIONAL STANDARD ISO 11898-4 First edition 2004-08-01 Road vehicles Controller area network (CAN) Part 4: Time-triggered communication Véhicules routiers Gestionnaire de réseau de communication (CAN)

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

Management of Protocol State

Management of Protocol State Management of Protocol State Ibrahim Matta December 2012 1 Introduction These notes highlight the main issues related to synchronizing the data at both sender and receiver of a protocol. For example, in

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is

More information

Semi-Passive Replication in the Presence of Byzantine Faults

Semi-Passive Replication in the Presence of Byzantine Faults Semi-Passive Replication in the Presence of Byzantine Faults HariGovind V. Ramasamy Adnan Agbaria William H. Sanders University of Illinois at Urbana-Champaign 1308 W. Main Street, Urbana IL 61801, USA

More information

Wireless Medium Access Control Protocols

Wireless Medium Access Control Protocols Wireless Medium Access Control Protocols Telecomunicazioni Undergraduate course in Electrical Engineering University of Rome La Sapienza Rome, Italy 2007-2008 Classification of wireless MAC protocols Wireless

More information