Replication Using Group Communication Over a Partitioned Network

Size: px
Start display at page:

Download "Replication Using Group Communication Over a Partitioned Network"

Transcription

1 Relication Using Grou Communication Over a Partitioned Network Thesis submitted for the degree Doctor of Philosohy Yair Amir Submitted to the Senate of the Hebrew University of Jerusalem (1995).

2 This work was carried out under the suervision of Professor Danny Dolev ii

3 Acknowledgments I am deely grateful to Danny Dolev, my advisor and mentor. I thank Danny for believing in my research, for sending so many hours on it, and for giving it the theoretical touch. His warm suort and atient guidance heled me through. I hoe I managed to adot some of his rofessional attitude and integrity. I thank Daila Malki for her hel during the early stages of the Transis roject. Thanks to Idit Keidar for heling me sharen some of the issues of the relication server. I enjoyed my collaboration with Ofir Amir on develoing the coloring model of the relication server. Many thanks to Roman Vitenberg for his valuable insights regarding the extended virtual synchrony model and the relication algorithm. I benefited a lot from many discussions with Ahmad Khalaila regarding distributed systems and other issues. My thanks go to David Breitgand, Gregory Chokler, Yair Gofen, Nabil Huleihel and Rimon Orni, for their contribution to the Transis roject and to my research. I am grateful to Michael Melliar-Smith and Louise Moser from the Deartment of Electrical and Comuter Engineering, University of California, Santa Barbara. During two summers, several mutual visits and extensive electronic corresondence, Louise and Michael were involved in almost every asect of my research, and unofficially served as my co-advisors. The work with Deb Agarwal and Paul Ciarfella on the Totem rotocol contributed a lot to my understanding of high-seed grou communication. Ken Birman and Robbert van-renesse from the Comuter Science Deartment at Cornell University, were always willing to contribute their valuable advice to my research. Sending last summer with them was an educating exerience for me. For that I thank them both. Secial thanks to Ken for convincing me to ursue an academic osition. Thanks to Eldad Zamler for first introducing me to what became my research roblem, ten years ago. I thank Yaacov Ben-Yaacov and Gidi Kuerstein for six years of collaboration in building a working system and delivering it to the customer. They are all secial friends. I would like to thank my arents Shulamit and Reuven, for their love, encouragement and constant suort. I thank my brother Yaron, my brother Ofir, Amira and Lee, for always being there for me. Last, but not least, I am grateful to my wife and my artner Michal, for her unending suort. My success is the roduct of her wisdom, confidence, and love. iii

4 Contents 1. INTRODUCTION PROBLEM DESCRIPTION SOLUTION HIGHLIGHTS THESIS ORGANIZATION RELATED WORK Grou Communication Protocols Grou Communication Semantics Relication Protocols THE MODEL THE SERVICE MODEL THE FAILURE MODEL REPLICATION REQUIREMENTS THE ARCHITECTURE EXTENDED VIRTUAL SYNCHRONY EXTENDED VIRTUAL SYNCHRONY SEMANTICS Basic Delivery Delivery of Configuration Changes Self Delivery Failure Atomicity Causal Delivery Agreed Delivery Safe Delivery AN EXAMPLE OF CONFIGURATION CHANGES AND MESSAGE DELIVERY DISCUSSION GROUP COMMUNICATION LAYER THE TRANSIS SYSTEM THE RING RELIABLE MULTICAST PROTOCOL Message Ordering Membershi State Machine Achieving Extended Virtual Synchrony PERFORMANCE REPLICATION LAYER THE CONCEPT Concetual Algorithm Selecting a Primary Comonent Proagation by Eventual Path THE ALGORITHM iv

5 6.3 PROOF OF CORRECTNESS Safety Liveness CUSTOMIZING SERVICES FOR APPLICATIONS STRICT CONSISTENCY WEAK CONSISTENCY QUERY DIRTY QUERY TIMESTAMPS AND COMMUTATIVE UPDATES DISCUSSION CONCLUSIONS v

6 Abstract In systems based on the client-server model, a single server may serve many clients and the heavy load on the server may cause the resonse time to be adversely affected. In such circumstances, relicating data or servers may imrove erformance. Relication may also imrove the availability of information when rocessors crash or the network artitions. Existing relication methods are often needlessly exensive. They sometimes use ointto-oint communication when multicast communication is available; they tyically ay the full rice of end-to-end acknowledgments for all of the articiants for every udate; they may claim locks, and therefore, may be vulnerable to faults that can unnecessarily block the system for long eriods of time. This thesis resents a new architecture and algorithms for relication over a artitioned network. The architecture is structured into two layers: a relication server and a grou communication layer. Each of the relication servers maintains a rivate coy of the database. Actions (queries and udates) requested by the alication are globally ordered by the relication servers in a symmetric way. Ordered actions are alied to the database and result in a state change and in a rely to the alication. We rovide a grou communication ackage, named Transis, to serve as the grou communication layer. Transis utilizes the available non-reliable hardware multicast for efficient dissemination of messages to a grou of rocesses. The relication servers use Transis to multicast actions and to learn about changes in the membershi of the currently connected servers, in a consistent manner. Transis locally orders messages sent within the currently connected servers. The relication servers use this order to construct a long-term global total order of actions. Since the system is subject to artitioning, we must ensure that two detached comonents do not reach contradictory decisions regarding the global order. Therefore, the relication servers use dynamic linear voting to select, at most, one rimary comonent that continues to order actions. The architecture is non-blocking: actions can be generated by the alication anytime. While in a rimary comonent, queries are immediately relied in a consistent manner. While in a non-rimary comonent, the user can choose to wait for a consistent rely (that will arrive as soon as the network is reaired) or to get an immediate, though not necessarily consistent rely. High erformance of the architecture is achieved because: End-to-end acknowledgments are not needed on a regular basis. They are used only after membershi change events such as rocessor crashes and recoveries, and network artitions and merges. Synchronous disk writes are almost eliminated, without comromising consistency. Hardware multicast is used where ossible. vi

7 Chater 1 1. Introduction In systems based on the client-server model, a single server may serve many clients and the heavy load on the server may cause the resonse time to be adversely affected. In such circumstances, relicating data or servers may imrove erformance. Relication may also imrove the availability of information when rocessors crash or the network artitions. Existing relication methods are often needlessly exensive. They sometimes use ointto-oint communication when multicast communication is available. They tyically ay the full rice of end-to-end acknowledgment for all of the articiants for every udate, or even of several rounds of end-to-end acknowledgments. They may claim locks, and therefore, may be vulnerable to faults that can unnecessarily block the system for long eriods of time. This thesis ends a ten year rofessional journey. It started with my involvement in the design and imlementation of a large and geograhically distributed control system. The requirements of that system demanded a non-blocking solution with maximal availability. Each of the control stations had to be autonomous, to work desite network artitions, and to survive ower failures. To meet the requirements, we constructed a data relication scheme to function over an unreliable communication media in a dynamic environment. We managed to limit the udate semantics to commutative udates. Hence, the relica control roblem was reduced to imlementing a guaranteed delivery of actions to all of the relicas. This was done by constructing oint-to-oint stable queues. The concet was roven adequate and is still oerational today, maintaining consistent relication of several tens of databases. However, the use of oint-to-oint communication and the extensive use of synchronous disk writes, as well as the limitation imosed on the udate semantics, left me with a feeling that a better relication concet can be found. My Ph.D. research was motivated by this belief. Together with Danny Dolev, Dalia Malki and Shlomo Kramer, we initiated the Transis system, targeted at building tools for highly available distributed systems. We gave Transis its name to acknowledge the innovation of both the Trans rotocol [MMA90] and the ISIS system [BvR94]. Transis was aimed at roviding grou communication services using non-reliable hardware multicast available in most local area networks, tolerating network artitions and merges as well as rocessor crashes and recoveries. On to of Transis, we designed a relication server that eliminates the need for synchronous disk writes er udate without comromising consistency. Avoiding disk writes on the critical ath and utilizing hardware multicast renders our relication architecture highly efficient and more scalable than revious solutions. 1

8 1.1 Problem Descrition The roblem tackled in this thesis is how to construct an efficient and robust long-term relication architecture, within a fixed set of servers. Each server maintains a rivate coy of the database. The initial state of the database is identical at all of the servers. Tyically, each server runs on a different rocessor. The relication architecture is required to handle network artitioning. We exlicitly assume that the network may artition to several comonents. Some or all of the artitioned comonents, may subsequently re-merge. The architecture is also required to handle server crashes and recoveries. It is assumed that the underlying communication suorts some form of non-reliable multicast service (this service can be mimicked by unreliable oint-to-oint transmission). The architecture is required to overcome message omissions. We assume no message corrution. We rely on error detection and error correction rotocols to eliminate corruted messages. Corruted messages have the effect of omitted messages. We do not handle malicious faults. We assume that all the servers are running their rotocols faithfully. 1.2 Solution Highlights We resent a new architecture and algorithms for active relication over a artitioned network. Active relication is a symmetric aroach where each of the relicas is guaranteed to invoke the same set of actions at the same order. This aroach requires the next state of the database to be determined by the current state and the next action, and it guarantees that all of the relicas reach the same database state. Other factors, such as the assage of time, should not have any bearing on the next database state. The architecture, resented in Figure 1.1, is structured into two layers: a relication server and a grou communication layer. Each of the relication servers maintains a rivate coy of the database. Actions (queries and udates) requested by the alication are globally ordered by the relication servers in a symmetric way. Ordered actions are alied to the database and result in a state change and in a rely to the alication. The relication servers use the grou communication layer to efficiently disseminate actions, and to learn about changes in the membershi of the currently connected servers in a consistent manner. The grou communication layer locally orders messages disseminated within the currently connected grou. When a new comonent is formed by merging two or more comonents, the servers exchange information about actions and about the actions order in the system. Actions missed by at least one of the servers, are multicast, and the connected servers reach a 2

9 common state. This way, actions are roagated as soon as ossible. We call this method roagation by eventual ath. Alication DB Alication Rely DB Relication Server Global order of actions Local order of messages Request Relication Server Aly Grou Communication Network Grou Communication Figure 1.1: The Architecture. Since the system may artition, we must ensure that two different comonents do not reach contradictory decisions regarding the global order of actions. Hence, we need to identify at most one comonent, the rimary comonent, that may continue ordering actions. We emloy dynamic linear voting [JM90] which is generally acceted as the best technique when certain restrictions hold. We define a new semantics, extended virtual synchrony, for the grou communication service. The significance of extended virtual synchrony is that, during network artitioning and re-merging and during rocess crash and recovery, it maintains a consistent relationshi between the delivery of messages and the delivery of configuration change notifications across all rocesses in the system. Prior grou communication rotocols have focused on totally ordering messages at the grou communication level. That service, although useful for some alications, is not enough to guarantee comlete consistency at the alication level without additional endto-end acknowledgments, as has been noted by Cheriton and Skeen [CS93]. Extended virtual synchrony secifies the safe delivery service which rovides additional level of knowledge within the grou communication rotocol. The strict semantics of extended virtual synchrony and its safe delivery service is exloited by the relication servers to eliminate the need for end-to-end acknowledgment on a er-action basis without comromising consistency. End-to-end acknowledgment is only required when the membershi of connected servers is changed. e.g. in case of network artitions, merges, server crashes and recoveries. This leads to high erformance of the architecture. In the general case, when the membershi of connected servers is stable, the throughut and latency of actions is 3

10 determined by the erformance of the grou communication and not so much by other factors such as the number of relicas and the erformance of synchronous disk writes. The architecture is non-blocking: actions can be generated by the alication anytime. While in a rimary comonent, queries are immediately relied in a consistent manner. While in a non-rimary comonent, the user can choose to wait for a consistent rely (that will arrive as soon as the network is reaired) or to get an immediate, though not necessarily consistent rely. Two different, well-defined, semantics are available for immediate relies in a non-rimary comonent. The key contributions of this Ph.D. research are: Defining an efficient architecture for relication. Constructing a highly efficient reliable multicast rotocol that tolerates artitions, and imlementing it in a general Unix environment. The symmetric rotocol rovides reliable message ordering and membershi services. The rotocol s excetional erformance is achieved by utilizing a non-reliable multicast service where ossible. Defining the extended virtual synchrony semantics for grou communication services. Extended virtual synchrony, among other things, strictly defines message delivery semantics in the resence of network artitions and re-merges, as well as rocess crashes and recoveries. Constructing the roagation by eventual ath technique for efficient information dissemination in a dynamic network. This method utilizes grou communication to roagate knowledge as soon as ossible between servers. The strengths of the roagation by eventual ath method are most evident when the membershi of connected servers is dynamically changing. Eliminating the need for end-to-end acknowledgments and for synchronous disk writes on a er-action basis. Instead, end-to-end acknowledgments and synchronous disk writes are needed once, just after a change in the membershi of the connected servers. Tailoring and otimizing relication services for different kinds of alications. 1.3 Thesis Organization The rest of the thesis is organized as follows: The next subsection resents revious research in grou communication rotocols, grou communication semantics, and relication rotocols. Chater 2 resents the theoretical model and defines the correctness criteria of the solution. 4

11 Chater 3 resents the overall relication architecture. Chater 4 defines the extended virtual synchrony semantics. Chater 5 resents Transis, our grou communication layer, which rovides extended virtual synchrony. We describe the logical ring rotocol, one of the two reliable multicast rotocols oerational in Transis. Throughut and latency measurements of Transis, over a network of Pentium machines running Unix, are rovided. Chater 6 details our relication server. The relication rotocol demonstrates how extended virtual synchrony is exloited to rovide efficient long-term relication service. Chater 7 customizes services for different kinds of alications. Chater 8 concludes this thesis. A reader, interested in an overview of this thesis beyond the introduction, may read Chater 3, Chater 5 Section 1 and Section 3, and Chater 6 Section 1. A reader interested in the ractical asects of this thesis and in imlementation details, may want to focus on Chater 3, Chater 5 Section 2 and Section 3, Chater 6 Section 2, and Chater 7. Additional information including a coy of this thesis, a slide show, relevant ublished aers and more, can be obtained from: htt:// htt:// or by writing to yairamir@cs.jhu.edu or 1.4 Related Work Much work has been done in the area of grou communication and in the area of relication. We relate our work to three research areas: grou communication rotocols, grou communication semantics, and relication rotocols Grou Communication Protocols The ISIS toolkit [BJ87, BCJM+90, BvR94] is one of the first general urose grou communication systems. ISIS rovides a grou communication session service, where rocesses can join rocess grous, multicast messages to grous, and receive messages sent to grous. Two multicast rimitives are rovided: The CBCAST service guarantees causally ordered message delivery (see [Lam78]) across overlaing grous. CBCAST is imlemented using vector timestams that are iggybacked on each message. The ABCAST service extends the causal order to a total order using a central grou coordinator that emits ordering decisions. ISIS also rovides membershi notifications 5

12 when the grou membershi is changed. Grou membershi changes due to rocesses voluntarily joining or leaving the grou, or due to rocess failures. Network artitions and re-merges, as well as rocess recoveries, are not suorted. The novelty of ISIS is in guaranteeing a formal and rigorous service semantics named virtual synchrony. ISIS rotocols are imlemented using oint-to-oint communication. Although much better rotocols exist today, and desite the lack of suort for network artitions, ISIS is the most mature general urose system available today. The ISIS system is commercially available from ISIS Distributed Systems LTD. The V system [CZ85] rovides grou communication services at the oerating system level. It was the first to utilize hardware multicast to imlement rocess grou communication. However, only non-reliable, best-effort, unordered delivery service is rovided. Similar services for wide area networks are rovided by the IP-multicast [Dee89] rotocol. The Chang and Maxemchuk reliable broadcast and ordering rotocol [CM84] uses a token-assing strategy, where the rocessor holding the token acknowledges messages. All the articiating rocessors can broadcast messages at any time. The rotocol also rovides membershi and token recovery algorithms. Tyically, between two and three messages are required to order a message in an otimally loaded system. The rotocol does not rovide a mechanism for flow control. The TPM rotocol [RM89] uses a token on a logical ring of rocessors for broadcasting and retransmission of messages. The token is circulated along a known token list in order to serialize message transmission. The token contains the next sequence number to be stamed on new messages. TPM starts by circulating the token to multicast a set of messages. Then, the token is used to retransmit messages belonging to the set, that are missed by some of the rocessors. When no message is missed by any of the rocessors, the whole set is delivered to the alication and a new set of messages can be introduced. TPM also rovides a dynamic membershi and token regeneration algorithm. If the network artitions, the comonent with the majority of the members (if such exists) is allowed to continue. The Delta-4 [Pow91] system rovides tools for building distributed, fault-tolerant realtime systems. As art of Delta-4, a reliable multicast rotocol, xam [RV92] and a membershi rotocol [RVR93] are imlemented. The rotocols utilize the non-reliable multicast or broadcast rimitive of local area networks. The Delta-4 rotocols assume failsto behavior and as such, do not suort network artitions and re-merges. The membershi rotocol rovides low-level rocessor membershi so that a higher level rocess grou membershi can be built on to of it in a simle way. Our exerience in Transis indicates that this two-levels architecture is better than solving the membershi roblem at the rocess level. Delta-4 is more real-time oriented than Transis, and it uses a secial hardware for message ordering and failure detection. This seems to be a strong limitation on the roject s usability. 6

13 The Amoeba distributed oerating system uses the Fli high erformance reliable multicast rotocol [KvRvST93] to suort high level services such as fault-tolerant directory service [KTV93]. In Amoeba, members of the grou send oint-to-oint messages to a distinct member called the sequencer. The sequencer stams each message with a sequence number and broadcasts it to the grou. A Member that detects a ga in the message sequences, sends a oint-to-oint retransmission request to the sequencer. The Amoeba system is resilient to any re-defined number of failed rocessors, but its erformance degrades as the number of allowed failures is increased. The Trans and Total rotocols [MMA90, MMA93, MM93] rovide reliable ordered broadcast delivery in an asynchronous environment. The Trans rotocol uses ositive and negative acknowledgments iggybacked onto broadcast messages and exloits the transitivity of ositive acknowledgments to reduce the number of acknowledgments required. The Total rotocol, layered on to of the Trans rotocol, converts the artial order into a total order. The Trans and Total rotocols maintain causality and ensure that oerational rocessors continue to order messages even though other rocessors have failed, rovided that a resiliency constraint is met. A membershi rotocol [MMA94] is imlemented on to of Total. If a rocessor susects another rocessor, it sends a fault message for the susected rocessor. When that message is ordered, the membershi is changed to exclude this rocessor. The limitation of that architecture is that if Total cannot order the membershi messages (e.g. because the resiliency constraint is not met), the system is blocked. The Psync rotocol [PBS89] builds a context grah that reresents the causal artial order on messages. This order can be extended into a total order by determining comlete waves of causally concurrent messages and by ordering the messages of a wave using some deterministic order. Based on the causal order rovided by Psync, a membershi algorithm is constructed [MPS91]. Using this algorithm, rocessors reach eventual agreement on membershi changes. The algorithm handles rocessor faults and allows a rocessor to join a re-existing grou asymmetrically. Network artitions and re-merges are not suorted. The Newto rotocol [MES93, Mac94] relaces the context grah of Psync by the notion of causal blocks. Each causal block defines a set of messages. All the messages within a block are causally indeendent. The blocks are totally ordered. The messages in a block are delivered together, in some deterministic order. In this way, Newto rovides totally ordered delivery similar to the wave technique of Psync and the all-ack mechanism of Lansis [ADKM92a], but with much less bookkeeing. Newto causal delivery is less efficient than Psync or Trans because the causal information reresented in causal blocks is not accurate and more essimistic then needed (though more comact). Moreover, using causal blocks eliminates the ability to use faster algorithms (e.g. TOTO [DKM93]) that use the full context grah to reach fast decision on total order. Newto imlements a membershi service that handles rocessor crashes and network artitions. However, rocess recoveries and network re-merges are not addressed. The most interesting oint of Newto is its service semantics resented in the next section. The Horus roject [vrbfhk95] imlements grou communication services, roviding unreliable or reliable FIFO, causal, or total multicast services. Horus is extensively layered 7

14 and highly configurable, allowing alications to only ay for the overhead of services they use. The layers include the COM layer which rovides basic non-reliable multicast, the NAK layer which rovides reliable FIFO multicast, the MBRSHIP layer that rovides membershi maintenance, the STABLE layer which rovides message stability, the FC layer which rovides flow control, the CAUSAL and TOTAL layers, the LWG layer which maintains rocess grous, the EVS layer which maintains extended virtual synchrony (see below), and many more. Advanced memory management techniques are used in order to avoid the full cost of layering. The Transis roject, described in Section 5.1, rovides grou communication services in a artitionable network. Three multicast rimitives are rovided according to the extended virtual synchrony semantics: Causal multicast, Agreed multicast for total order delivery, and Safe multicast that rovides even stronger guarantees. Two different reliable multicast rotocols are imlemented in Transis. Lansis [ADKM92a], the earlier rotocol, uses a direct acyclic grah (DAG) reresenting the causal relation on messages to rovide reliable multicast. The DAG is derived from negative and ositive acknowledgments iggybacked on messages. The causal order mechanism in Lansis is derived from the Trans rotocol with several imortant modifications that adat it for ractical use. Two total order algorithms extended the causal order to a total, agreed order. The first is the all-ack algorithm which is similar to the algorithm used in Psync, and the second is the TOTO early delivery algorithm [DKM93]. Both comutes the total order based on the DAG structure without exchange of additional messages. While TOTO is more efficient than the all-ack rotocol, it cannot maintain extended virtual synchrony. The membershi algorithm of Transis [ADKM92b] is a symmetric rotocol that was the first to handle network artitions and re-merges. Although oerational in asynchronous environment, the algorithm ensures termination in a bounded time. The basic idea of this membershi algorithm was adoted by Totem and Horus. Excellent reading about Transis and its membershi algorithm is found in [Mal94]. The second reliable multicast rotocol in Transis is the Ring rotocol, detailed in Section 5.2. The Ring rotocol was develoed while the author was visiting the Totem roject. The Totem system [Aga94] rovides reliable multicast and membershi services across a collection of local-area networks. The Totem system is comosed of a hierarchy of two rotocols. The bottom layer is the Ring rotocol [AMMAC93, AMMAC95] which rovides reliable multicast and rocessor membershi services within a broadcast domain. The uer layer is the Multile-Rings rotocol [Aga94] that rovides reliable delivery and ordering across the entire network. Gateways are resonsible to forward messages and configuration changes between broadcast domains. Each gateway interconnects two broadcast domains, and articiates in the Ring rotocol for each of them. Each domain may contain several gateways connecting it to several other domains. Extended virtual synchrony was first imlemented in the Totem system [AMMAC93]. 8

15 1.4.2 Grou Communication Semantics It is highly imortant for a grou communication service to maintain a well-defined service semantics. The alication builder can rely on that semantics when designing correct alications using this grou communication service. The semantics must secify both the assumtions taken and the guarantees rovided. The ISIS system defines and maintains the virtual synchrony semantics [BvR94, BJ87, SS93]. Virtual synchrony ensures that all the rocesses belonging to a rocess grou erceive configuration changes as occurring at the same logical time. Moreover, all rocesses belonging to a configuration deliver the same set of message for that configuration. A message is guaranteed to be delivered at the same configuration in which it was multicast at all the rocesses that deliver it. The delivery of a CBCAST message maintains causality. The delivery of an ABCAST message, in addition, occurs at the same logical time at all the rocesses. Virtual synchrony assumes message omission faults and fail-sto rocess faults. i.e. a rocess that fails can never (or is not allowed to) recover. When network artitioning occurs, virtual synchrony ensures that rocesses in at most one connected comonent of the network, the rimary comonent, are able to make rogress; rocesses in other comonents become blocked. Unfortunately, before a rocess fails or before it detects that it had artitioned from the rimary comonent, ISIS may deliver messages to it in an order inconsistent with the order determined at the rimary comonent (if a database is maintained by the detached rocess, these messages may result in an inconsistent database state). Therefore, if a rocess recovers after a crash, or can merge again with the rimary comonent, it must come back with a different rocess identifier and it is considered as a new rocess. If this rocess maintains stable storage (e.g. database), this storage has to be erased. Unable to coe with network artitions and re-merges, and with rocess recoveries, virtual synchrony has a limited ractical value. Nevertheless, the virtual synchrony model emhasized the imortance of a rigorous semantics for grou communication services. To overcome these drawbacks, we extended the definition of virtual synchrony. This extension, extended virtual synchrony [MAMA94] is detailed in Chater 4. Valuable work done at the Newto roject [Mac94], searately from the work done in Transis and Totem, defines another grou communication semantics which extends virtual synchrony to suort artitions. Newto semantics secifies several roerties regarding the delivery of messages and configuration changes. It generalizes the rimary comonent model of virtual synchrony to suort several artitioned comonents without the need to block non-rimary comonents (the alication is, of course, free to block oeration in non-rimary comonents if it refers). Newto semantics is weaker than the extended virtual synchrony semantics. In articular, since Newto does not suort network remerges, weaker requirements are secified for totally ordered delivery. This weakness allows the total order determined at a rocess to vary, and to contain holes, when comared to the total order determined at another rocess that just artitioned. Moreover, 9

16 Newto semantics does not secify the safe delivery roerty of extended virtual synchrony, whose imortance is made clear at Chater 6 of this thesis. A recent work by Cristian and Schmuck on grou membershi in an asynchronous environment [CS95] defines the timed synchronous system model. In contrast to the theoretical asynchronous model that has no notion of time, the timed synchronous model assumes that rocessors have local clocks that allow them to measure the assage of time. Local clocks may drift with some (small) bounded rate. Each rocessor also contains a stable storage. Processor crashes introduce artial-amnesia behavior where the state of stable storage is the same as before the crash, while the state of the volatile storage is reinitialized. The model allows for message omission or erformance (delay) faults, rocessor crashes and recoveries, and network artitions and re-merges. The unique asect of [CS95], lays in bounding the local time u to which certain guarantees of the grou membershi service will hold at each of the rocessors. While the membershi algorithms develoed in Transis and Totem do maintain the requirements resented in [CS95], they are not required to do so by the extended virtual synchrony model (which leaves local time out of the model). Combining ideas from the timed synchronous model to extended virtual synchrony might lead to a model which guarantees stronger liveness roerties (that are rovided anyway by the imlementations of Transis and Totem). This, in turn, might lead to the ability to rove stronger liveness roerties (with bounded local time) for rotocols that currently use extended virtual synchrony to reason about their behavior. e.g. it might be ossible to rove a better liveness roerty for the relication rotocol described in Chater 6, than the required liveness roerty stated in Chater Relication Protocols Much work has been done in the area of relication. Traditionally, a relicated database is considered correct if it behaves as if there is only one coy of it, as far as the user can tell. This roerty is called one-coy equivalence. In a one-coy database, the system should ensure serializability. i.e. interleaved execution of user transactions is equivalent to some serial execution of these transactions. Thus, a relicated database is considered correct if it is one-coy serializable ([BHG87]). i.e. it ensures serializability and one-coy equivalence. Two-hase-commit rotocols [EGLT76] are the main tool for roviding serializability in a distributed database system when transactions may san several sites. The same rotocols can be used to maintain one-coy serializability in a relicated database. In a tyical rotocol of this kind [Gra78], one of the servers, the transaction coordinator, sends a request to reare to commit to all of the articiating servers. Each server relies either by a ready to commit or by an abort. If any of the servers votes to abort, all of them abort. The transaction coordinator collects all the resonses and informs the servers of the decision. Between the two hases, each server kees the local database locked waiting for the final word from the transaction coordinator. If a server fails before its vote reaches the 10

17 transaction coordinator, it is usually assumed to vote abort. If the transaction coordinator fails, all the servers remain blocked indefinitely, unable to resolve the transaction. Even though blocking reserves consistency, it is highly undesirable because the locks cannot be relinquished, rendering the data inaccessible by other requests at oerational servers. Clearly, a rotocol of this kind imoses a substantial additional communication cost on each transaction. Three-hase-commit rotocols [Ske82] try to overcome some of the availability roblems of two-hase-commit rotocols, aying the rice of an additional communication round, and therefore, of additional latency. In case of server crashes or network artitions, a three-hase-commit rotocol allows a majority or a quorum to resolve the transaction. If failures cascade, however, a majority can be connected and still remain blocked as is shown in [KD95]. A recent work by [KD95] resents an imroved version of three-hase-commit that always allows a connected majority to roceed, regardless of ast failures. In the available coy rotocols [BHG87], udate oerations are alied at all of the available servers, while a query accesses any server. Correct execution of these rotocols require that the network never artition. Otherwise they block. Voting rotocols are based on quorums. The basic quorum scheme uses majority voting [Tho79] or weighted majority voting [Gif79]. Using voting rotocols, each site is assigned a number of votes. The database can be udated in a artition only if that artition contains more than half of the votes. The Accessible Coies algorithms [ESC85, ET86] maintain an aroximate view of the connected servers, called a virtual artition. A data item can be read/written within a virtual artition only if this virtual artition (which is an aroximation of the current connected comonent) contains a majority of its read/write votes. If this is the case, the data item is considered accessible and read/write oerations can be done by collecting subquorums in the current comonent. The maintenance of virtual artitions greatly comlicates the algorithm. When the view changes, the servers need to execute a rotocol to agree on the new view, as well as to recover the most u-to-date item state. Moreover, although view decisions are made only when the membershi of connected servers changes, each udate requires the full end-to-end acknowledgment from the sub-quorum. Dynamic linear voting [JM87, JM90] is a more advanced aroach that defines the quorum in an adative way. When a network artition (or re-merge) occurs, if a majority of the last installed quorum is connected, a new quorum is established and udates can be erformed within this artition. Dynamic linear voting generally outerforms the static schemes as shown by [PL88]. Esilon serializability [PL91] alies an extension to the serializability correctness criterion. Esilon serializability introduces a tradeoff between consistency and availability. It allows inconsistent data to be seen, but requires that data will eventually converge to a consistent (one-coy serializability) state. The user can control the degree of inconsistency. In the limit, strict one-coy serializability can be enforced. Several relica control rotocols are suggested in [PL91]. One of these rotocols limits the transactional model to commutative oerations (COMMU) and another limits it to read-indeendent 11

18 timestamed udates (RITU). In contrast, the ordered udates (ORDUP) rotocol does not limit the transactional model. ORDUP executes transactions asynchronously, but in the same order at all of the relicas. Udate transactions are disseminated and are alied to the database when they are totally ordered. The relication rotocol resented in Chater 6 of this thesis comlies with the ORDUP model. Otimizations for COMMU and RITU udates models are resented in Chater 7 of this thesis. Lazy relication [LLSG90, LLSG92] is a relication method that overcomes network artitions and re-merges. It relaxes the constraints on oeration ordering by exloiting the semantics of the service s oerations. The client alication can secify exactly what causal relations should be enforced between oerations. Using this aroach, unrelated oerations do not incur any latency delay due to communication. By using a gossi method to roagate oerations, lazy relication ensures reliable eventual delivery of all the oerations to all of the relica. However, the loose control on oeration transmissions between relicas is a serious drawback of lazy relication. An oeration might be transmitted from one relica to another many times, even when it is already known at the other relica. The timestamed anti-entroy relication technique [Gol90] rovides eventual weak consistency. This method also ensures the eventual delivery of each action to each of the relication servers using an eidemic technique: Pairs of servers eriodically contact each other to exchange actions that one of them has and the other misses. This exchange is called anti-entroy session. When the network artitions and subsequently re-merges, servers from different comonents exchange actions generated at the disconnected comonent using anti-entroy sessions. A total order on the actions can be laced using a similar method to [AAD93]. The anti-entroy technique used to roagate actions is far more efficient comared to the gossi technique of [LLSG90]. In rior research [AAD93], we described an architecture that uses the Transis grou communication layer to achieve consistent relication. The architecture handles network artitions and re-merges, as well as server crashes and recoveries. It constructs a highly efficient eidemic technique, using the configuration change notification rovided by Transis to kee track of the membershi of the currently connected servers. Uon a reconfiguration change, the currently connected servers efficiently exchange state information. Each action known to one of the servers and missed by at least one server, is sent exactly once. The relication servers does not need to worry about message omissions because the grou communication layer (Transis) guarantees reliable multicast. This technique is more efficient than the anti-entroy technique because instead of using two-way exchange of knowledge and actions, multi-way exchange is used. Moreover, the exchange takes lace exactly when it is needed (i.e. after a membershi change) rather than eriodically. The serious inefficiency of [AAD93] is the method of global total ordering, which uses Lamort clock and requires an eventual ath from every server to order an action. A valuable work by Keidar [Kei94] uses the architecture of [AAD93] but relaces its global total ordering method. The novel ordering algorithm in [Kei94] always allows a connected majority of the servers to make rogress, regardless of ast failures. As in [AAD93], it always allows servers to initiate actions (even when they are not art of a 12

19 connected majority). Thus, actions can eventually become totally ordered even if their initiator is never a member of a majority comonent. Both [Kei94] and [AAD93] use the flow control and multicast roerties of grou communication, but both still need an end-to-end acknowledgments between servers on a er-action basis to allow global ordering of a message. This diminishes the erformance advantages gained by using grou communication. The relication server, described in [ADMM94] and detailed in Chater 6 of this thesis, eliminates the need for an end-to-end acknowledgment at servers level without comromising consistency. End-to-end acknowledgment is still needed just after the membershi of the connected server is changed. Thus, the erformance gain is substantial, and is determined by the erformance rovided by the grou communication. The rice to ay (comared to [Kei94]) is that there exist rare scenarios in which multile servers in the rimary comonent crash or become disconnected within a window of time so short that the membershi algorithm could not be comleted anywhere. In these scenarios, if none of the servers is certain about which actions were ordered within that rimary comonent (e.g. due to a global crash), then the recovery of, and communication with, every server of the last rimary comonent is required before the next rimary comonent can be formed. 13

20 Chater 2 2. The Model 2.1 The Service Model A Database is a collection of organized, related data that can be accessed and maniulated. An Action defines a transition from the current state of the database to the next state; the next state is comletely determined by the current state and the action. Each Action contains an otional query art and an otional udate art. The udate art of an action defines a modification to be made to the database, and the query art returns a value. A relication service maintains a relicated database in a distributed system. The relication service is rovided by a known finite set of rocesses, called the servers grou. The individual rocesses within the servers grou are called relication servers or simly servers, each of which has a unique identifier. Each server within the servers grou maintains a rivate coy of the database on stable storage. The initial state of the database is identical at all of the servers. Tyically, each server runs on a different rocessor. Processes to which the service is rovided are called clients. The number of clients in the system is unlimited. We introduce the following notation: S is the servers grou. a si, is the ith action erformed by server s. D si, is the state of the database at server s after actions 1..i have been erformed by server s. stable_system(s, r) is a redicate that denotes the existence of a set of servers containing s and r, and a time, from which on, that set does not face any communication or server failure. Note that this redicate is only defined to reason about the liveness of certain rotocols. It does not imly any limitation on our ractical rotocol. 14

21 2.2 The Failure Model The system is subject to message omission, server crashes and network artitions. We assume no message corrution and no malicious faults. A server or a rocessor may crash and may subsequently recover after an arbitrary amount of time. A server recovers with its stable storage intact, is aware of its recovery, and retains its old identifier. The network may artition into a finite number of comonents. The servers in a comonent can receive messages generated by other servers in the same comonent, but servers in two different comonents are unable to communicate with each other. Two or more comonents may subsequently merge to form a larger comonent. A message which is multicast within a comonent may get lost by some or even all of the rocessors. 2.3 Relication Requirements According to the service model, the initial state of the database is identical at all of the servers. sr, S D = D. s, 0 r, 0 Also, the next state of the database is comletely determined by the current state and the erformed action. s S Dsi, function( Dsi,, asi, ) = 1. The correctness criteria for the solution are defined as follows: Safety. If server s erforms the ith action and server r erforms the ith action, then these actions are identical. a, a a = a. si, ri, si, ri, Note that if the servers erform the same set of actions in the same order then they reach an identical state. For databases that comly with our service model (where the next database state is comletely determined by the current state and the erformed action), our safety criterion translates to one-coy serializability (see [BHG87]). One-coy serializability requires that concurrent execution of actions on a relicated database be equivalent to some serial execution of these actions on a non-relicated database. 15

22 Liveness. If server s erforms an action and there exists a set of servers containing s and r, and a time, from which on, that set does not face any communication or rocesses failures, then server r eventually erforms the action. ( a si, o stable_ system( s, r)) a ri,. Our liveness criterion only admits rotocols that roagate actions between any two servers, while it excludes rotocols that rely on a central server, or on some secific servers, to roagate actions. 16

23 3. The Architecture Chater 3 Two main aroaches for relication are known in the literature: the first is the rimary-backu aroach, and the second is active relication. In the rimary-backu aroach, one of the relication servers, the rimary, is the only server allowed to resond to alication requests (actions). The other servers, the backus, udate their coy of the database after the rimary informs them of the action. If the rimary crashes, one of the backus takes over and becomes the new rimary. Some rimary-backu architectures allow backus to resond to queries in order to increase system erformance. Active relication, in contrast, is a symmetric aroach where each of the relication servers is guaranteed to invoke the same set of actions in the same order. This aroach requires the next database state to be determined by the current state and the next action. Other factors, such as the assage of time, have no bearing on the next state. Some active relication architectures relicate only the udates, while queries are locally relied. This work takes the aroach of active relication. As can be seen in Figure 3.1, our relication architecture is a symmetric architecture which is structured into two layers: a relication server layer and a grou communication layer. Tyically, each relication server is a rocess that runs on a different rocessor that hosts a coy of the database. The grou communication layer is another rocess running on the same rocessor and communicating with the relication server via inter rocess communication mechanisms. Alternatively, it can be imlemented as a library which is linked within the relication server rocess. Each of the relication servers maintains a rivate coy of the database. The client alication requests an action from one of the relication servers. The client-server interaction is done via some communication mechanism such as RPC, IPC, or even via the grou communication layer. The relication servers agree on the order of actions to be erformed on the relicated database. As soon as a relication server knows the final order of an action, it alies this action to the database. If the action contains a query art, a rely is returned to the client alication from the database coy maintained by the original server that got the request. The relication servers use the grou communication layer to disseminate the actions among the servers grou and to hel reach an agreement about the final global order of the set of actions. 17

24 In a tyical oeration, when an alication requests an action from a relication server, this server generates a message containing the action. The message is then assed to the local grou communication layer which sends the message over the communication medium. Each of the currently connected grou communication layers finally receives the message and then delivers the message in the same order to their relication servers. We say that these servers are currently connected. If the system artitions into several comonents, the relication servers identify at most one comonent as the rimary comonent. The relication servers in a rimary comonent determine the final global total order of actions according to the order rovided by the grou communication layer. As soon as the final order of an action is determined, this action is alied to the database. In the rimary comonent, new actions can be ordered, and be alied to the database, immediately uon delivery by the grou communication layer. In non-rimary comonents, actions must be delayed until communication is restored and the servers learn of the order determined by the rimary comonent. Alication Rely Alication Request DB DB Relication Server Aly Relication Server Actions Generate Grou Communication Deliver Grou Communication Messages Send Receive Medium Figure 3.1: Detailed Architecture The grou communication layer rovides reliable multicast and membershi services according to the extended virtual synchrony model secified in Chater 4. This layer overcomes message omission faults and notifies the relication server of changes in the membershi of the currently connected servers. This notification corresonds to server crashes and recoveries and to network artitions and re-merges. The Transis system, which is an imlementation of such grou communication layer is described in Chater 5. 18

Distributed Systems (5DV147)

Distributed Systems (5DV147) Distributed Systems (5DV147) Mutual Exclusion and Elections Fall 2013 1 Processes often need to coordinate their actions Which rocess gets to access a shared resource? Has the master crashed? Elect a new

More information

An Indexing Framework for Structured P2P Systems

An Indexing Framework for Structured P2P Systems An Indexing Framework for Structured P2P Systems Adina Crainiceanu Prakash Linga Ashwin Machanavajjhala Johannes Gehrke Carl Lagoze Jayavel Shanmugasundaram Deartment of Comuter Science, Cornell University

More information

Distributed Algorithms

Distributed Algorithms Course Outline With grateful acknowledgement to Christos Karamanolis for much of the material Jeff Magee & Jeff Kramer Models of distributed comuting Synchronous message-assing distributed systems Algorithms

More information

The Anubis Service. Paul Murray Internet Systems and Storage Laboratory HP Laboratories Bristol HPL June 8, 2005*

The Anubis Service. Paul Murray Internet Systems and Storage Laboratory HP Laboratories Bristol HPL June 8, 2005* The Anubis Service Paul Murray Internet Systems and Storage Laboratory HP Laboratories Bristol HPL-2005-72 June 8, 2005* timed model, state monitoring, failure detection, network artition Anubis is a fully

More information

Election Algorithms. has elected i. will eventually set elected i

Election Algorithms. has elected i. will eventually set elected i Election Algorithms Election 8 algorithm designed to designate one unique rocess out of a set of rocesses with similar caabilities to take over certain functions in a distributes system central server

More information

Replication over a Partitioned Network

Replication over a Partitioned Network Replication over a Partitioned Network Yair Amir Ph.D. Presentation The Transis Project The Hebrew University of Jerusalem yairamir@cs.huji.ac.il http://www.cs.huji.ac.il/papers/transis/yairamir/yairamir.html

More information

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network 1 Sensitivity Analysis for an Otimal Routing Policy in an Ad Hoc Wireless Network Tara Javidi and Demosthenis Teneketzis Deartment of Electrical Engineering and Comuter Science University of Michigan Ann

More information

10 File System Mass Storage Structure Mass Storage Systems Mass Storage Structure Mass Storage Structure FILE SYSTEM 1

10 File System Mass Storage Structure Mass Storage Systems Mass Storage Structure Mass Storage Structure FILE SYSTEM 1 10 File System 1 We will examine this chater in three subtitles: Mass Storage Systems OERATING SYSTEMS FILE SYSTEM 1 File System Interface File System Imlementation 10.1.1 Mass Storage Structure 3 2 10.1

More information

Distributed Systems. 7. Coordination and Agreement

Distributed Systems. 7. Coordination and Agreement Distributed Systems 7. Coordination and Agreement Werner Nutt 1 Co-ordination Algorithms are fundamental in distributed systems: to dynamically re-assign the role of master choose rimary server after crash

More information

Collective communication: theory, practice, and experience

Collective communication: theory, practice, and experience CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Comutat.: Pract. Exer. 2007; 19:1749 1783 Published online 5 July 2007 in Wiley InterScience (www.interscience.wiley.com)..1206 Collective

More information

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22 Collective Communication: Theory, Practice, and Exerience FLAME Working Note # Ernie Chan Marcel Heimlich Avi Purkayastha Robert van de Geijn Setember, 6 Abstract We discuss the design and high-erformance

More information

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications OMNI: An Efficient Overlay Multicast Infrastructure for Real-time Alications Suman Banerjee, Christoher Kommareddy, Koushik Kar, Bobby Bhattacharjee, Samir Khuller Abstract We consider an overlay architecture

More information

Slides for Chapter 12: Coordination and Agreement

Slides for Chapter 12: Coordination and Agreement Slides for hater : oordination and greement rom oulouris, ollimore and Kindberg istributed Systems: oncets and esign dition, ailure ssumtions and ailure etectors reliable communication channels rocess

More information

Slides for Chapter 15: Coordination and Agreement

Slides for Chapter 15: Coordination and Agreement Slides for Chater 15: Coordination and Agreement From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concets and Design Edition 5, Addison-Wesley 2012 Overview of Chater Introduction Distributed

More information

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data Efficient Processing of To-k Dominating Queries on Multi-Dimensional Data Man Lung Yiu Deartment of Comuter Science Aalborg University DK-922 Aalborg, Denmark mly@cs.aau.dk Nikos Mamoulis Deartment of

More information

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS Kevin Miller, Vivian Lin, and Rui Zhang Grou ID: 5 1. INTRODUCTION The roblem we are trying to solve is redicting future links or recovering missing links

More information

Time and Coordination in Distributed Systems. Operating Systems

Time and Coordination in Distributed Systems. Operating Systems Time and Coordination in Distributed Systems Oerating Systems Clock Synchronization Physical clocks drift, therefore need for clock synchronization algorithms Many algorithms deend uon clock synchronization

More information

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K.

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K. inuts er clock cycle Streaming ermutation oututs er clock cycle AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS Ren Chen and Viktor K.

More information

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism Erlin Yao, Mingyu Chen, Rui Wang, Wenli Zhang, Guangming Tan Key Laboratory of Comuter System and Architecture Institute

More information

Recap: Consensus. CSE 486/586 Distributed Systems Mutual Exclusion. Why Mutual Exclusion? Why Mutual Exclusion? Mutexes. Mutual Exclusion C 1

Recap: Consensus. CSE 486/586 Distributed Systems Mutual Exclusion. Why Mutual Exclusion? Why Mutual Exclusion? Mutexes. Mutual Exclusion C 1 Reca: Consensus Distributed Systems Mutual Exclusion Steve Ko Comuter Sciences and Engineering University at Buffalo On a synchronous system There s an algorithm that works. On an asynchronous system It

More information

Space-efficient Region Filling in Raster Graphics

Space-efficient Region Filling in Raster Graphics "The Visual Comuter: An International Journal of Comuter Grahics" (submitted July 13, 1992; revised December 7, 1992; acceted in Aril 16, 1993) Sace-efficient Region Filling in Raster Grahics Dominik Henrich

More information

Pirogue, a lighter dynamic version of the Raft distributed consensus algorithm

Pirogue, a lighter dynamic version of the Raft distributed consensus algorithm Pirogue a lighter dynamic version of the Raft distributed consensus algorithm Jehan-François Pâris Darrell D.. Long Deartment of Comuter Science Deartment of Comuter Science University of Houston University

More information

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model. U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 18 Professor Satish Rao Lecturer: Satish Rao Last revised Scribe so far: Satish Rao (following revious lecture notes quite closely. Lecture

More information

Complexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks

Complexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks Journal of Comuting and Information Technology - CIT 8, 2000, 1, 1 12 1 Comlexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks Eunice E. Santos Deartment of Electrical

More information

A Study of Protocols for Low-Latency Video Transport over the Internet

A Study of Protocols for Low-Latency Video Transport over the Internet A Study of Protocols for Low-Latency Video Transort over the Internet Ciro A. Noronha, Ph.D. Cobalt Digital Santa Clara, CA ciro.noronha@cobaltdigital.com Juliana W. Noronha University of California, Davis

More information

Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Spanning Trees 1

Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Spanning Trees 1 Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Sanning Trees 1 Honge Wang y and Douglas M. Blough z y Myricom Inc., 325 N. Santa Anita Ave., Arcadia, CA 916, z School of Electrical and

More information

Multimedia Multicast Transport Service for Groupware

Multimedia Multicast Transport Service for Groupware Multimedia Multicast Transport Service for Groupware Chockler, Gregory V., Huleihel, Nabil, Keidar, Idit, and Dolev, Danny, The Hebrew University of Jerusalem, Jerusalem, Israel 1.0 Abstract Reliability

More information

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University Query Processing Shuigeng Zhou May 18, 2016 School of Comuter Science Fudan University Overview Outline Measures of Query Cost Selection Oeration Sorting Join Oeration Other Oerations Evaluation of Exressions

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Distrib. Comut. 71 (2011) 288 301 Contents lists available at ScienceDirect J. Parallel Distrib. Comut. journal homeage: www.elsevier.com/locate/jdc Quality of security adatation in arallel

More information

Directed File Transfer Scheduling

Directed File Transfer Scheduling Directed File Transfer Scheduling Weizhen Mao Deartment of Comuter Science The College of William and Mary Williamsburg, Virginia 387-8795 wm@cs.wm.edu Abstract The file transfer scheduling roblem was

More information

Autonomic Physical Database Design - From Indexing to Multidimensional Clustering

Autonomic Physical Database Design - From Indexing to Multidimensional Clustering Autonomic Physical Database Design - From Indexing to Multidimensional Clustering Stehan Baumann, Kai-Uwe Sattler Databases and Information Systems Grou Technische Universität Ilmenau, Ilmenau, Germany

More information

Privacy Preserving Moving KNN Queries

Privacy Preserving Moving KNN Queries Privacy Preserving Moving KNN Queries arxiv:4.76v [cs.db] 4 Ar Tanzima Hashem Lars Kulik Rui Zhang National ICT Australia, Deartment of Comuter Science and Software Engineering University of Melbourne,

More information

Efficient Parallel Hierarchical Clustering

Efficient Parallel Hierarchical Clustering Efficient Parallel Hierarchical Clustering Manoranjan Dash 1,SimonaPetrutiu, and Peter Scheuermann 1 Deartment of Information Systems, School of Comuter Engineering, Nanyang Technological University, Singaore

More information

The Totem System. L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, R. K. Budhia, C. A. Lingley-Papadopoulos, T. P. Archambault

The Totem System. L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, R. K. Budhia, C. A. Lingley-Papadopoulos, T. P. Archambault The Totem System L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, R. K. Budhia, C. A. Lingley-Papadopoulos, T. P. Archambault Department of Electrical and Computer Engineering University of California,

More information

CS649 Sensor Networks IP Track Lecture 6: Graphical Models

CS649 Sensor Networks IP Track Lecture 6: Graphical Models CS649 Sensor Networks IP Track Lecture 6: Grahical Models I-Jeng Wang htt://hinrg.cs.jhu.edu/wsn06/ Sring 2006 CS 649 1 Sring 2006 CS 649 2 Grahical Models Grahical Model: grahical reresentation of joint

More information

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model IMS Network Deloyment Cost Otimization Based on Flow-Based Traffic Model Jie Xiao, Changcheng Huang and James Yan Deartment of Systems and Comuter Engineering, Carleton University, Ottawa, Canada {jiexiao,

More information

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation To aear in IEEE VLSI Test Symosium, 1997 SITFIRE: Scalable arallel Algorithms for Test Set artitioned Fault Simulation Dili Krishnaswamy y Elizabeth M. Rudnick y Janak H. atel y rithviraj Banerjee z y

More information

The Transis Approach to. High Availability Cluster Communication. Dalia Malki, Yair Amir, Danny Dolev, Shlomo Kramer. Institute of Computer Science

The Transis Approach to. High Availability Cluster Communication. Dalia Malki, Yair Amir, Danny Dolev, Shlomo Kramer. Institute of Computer Science The Transis Approach to High Availability Cluster Communication Dalia Malki, Yair Amir, Danny Dolev, Shlomo Kramer Institute of Computer Science The Hebrew University of Jerusalem Jerusalem, Israel Technical

More information

Object and Native Code Thread Mobility Among Heterogeneous Computers

Object and Native Code Thread Mobility Among Heterogeneous Computers Object and Native Code Thread Mobility Among Heterogeneous Comuters Bjarne Steensgaard Eric Jul Microsoft Research DIKU (Det. of Comuter Science) One Microsoft Way University of Coenhagen Redmond, WA 98052

More information

Introduction to Parallel Algorithms

Introduction to Parallel Algorithms CS 1762 Fall, 2011 1 Introduction to Parallel Algorithms Introduction to Parallel Algorithms ECE 1762 Algorithms and Data Structures Fall Semester, 2011 1 Preliminaries Since the early 1990s, there has

More information

Improved heuristics for the single machine scheduling problem with linear early and quadratic tardy penalties

Improved heuristics for the single machine scheduling problem with linear early and quadratic tardy penalties Imroved heuristics for the single machine scheduling roblem with linear early and quadratic tardy enalties Jorge M. S. Valente* LIAAD INESC Porto LA, Faculdade de Economia, Universidade do Porto Postal

More information

Using Standard AADL for COMPASS

Using Standard AADL for COMPASS Using Standard AADL for COMPASS (noll@cs.rwth-aachen.de) AADL Standards Meeting Aachen, Germany; July 5 8, 06 Overview Introduction SLIM Language Udates COMPASS Develoment Roadma Fault Injections Parametric

More information

From Total Order to Database Replication

From Total Order to Database Replication From Total Order to Database Replication Yair Amir and Ciprian Tutu Department of Computer Science Johns Hopkins University Baltimore, MD 21218, USA {yairamir, ciprian}@cnds.jhu.edu Technical Report CNDS-2001-6

More information

Equality-Based Translation Validator for LLVM

Equality-Based Translation Validator for LLVM Equality-Based Translation Validator for LLVM Michael Ste, Ross Tate, and Sorin Lerner University of California, San Diego {mste,rtate,lerner@cs.ucsd.edu Abstract. We udated our Peggy tool, reviously resented

More information

A GPU Heterogeneous Cluster Scheduling Model for Preventing Temperature Heat Island

A GPU Heterogeneous Cluster Scheduling Model for Preventing Temperature Heat Island A GPU Heterogeneous Cluster Scheduling Model for Preventing Temerature Heat Island Yun-Peng CAO 1,2,a and Hai-Feng WANG 1,2 1 School of Information Science and Engineering, Linyi University, Linyi Shandong,

More information

Skip List Based Authenticated Data Structure in DAS Paradigm

Skip List Based Authenticated Data Structure in DAS Paradigm 009 Eighth International Conference on Grid and Cooerative Comuting Ski List Based Authenticated Data Structure in DAS Paradigm Jieing Wang,, Xiaoyong Du,. Key Laboratory of Data Engineering and Knowledge

More information

Control plane and data plane. Computing systems now. Glacial process of innovation made worse by standards process. Computing systems once upon a time

Control plane and data plane. Computing systems now. Glacial process of innovation made worse by standards process. Computing systems once upon a time Classical work Architecture A A A Intro to SDN A A Oerating A Secialized Packet A A Oerating Secialized Packet A A A Oerating A Secialized Packet A A Oerating A Secialized Packet Oerating Secialized Packet

More information

Hebrew University. Jerusalem. Israel. Abstract. Transis is a high availability distributed system, being developed

Hebrew University. Jerusalem. Israel. Abstract. Transis is a high availability distributed system, being developed The Design of the Transis System??? Danny Dolev??? and Dalia Malki y Computer Science Institute Hebrew University Jerusalem Israel Abstract. Transis is a high availability distributed system, being developed

More information

Randomized algorithms: Two examples and Yao s Minimax Principle

Randomized algorithms: Two examples and Yao s Minimax Principle Randomized algorithms: Two examles and Yao s Minimax Princile Maximum Satisfiability Consider the roblem Maximum Satisfiability (MAX-SAT). Bring your knowledge u-to-date on the Satisfiability roblem. Maximum

More information

Identity-sensitive Points-to Analysis for the Dynamic Behavior of JavaScript Objects

Identity-sensitive Points-to Analysis for the Dynamic Behavior of JavaScript Objects Identity-sensitive Points-to Analysis for the Dynamic Behavior of JavaScrit Objects Shiyi Wei and Barbara G. Ryder Deartment of Comuter Science, Virginia Tech, Blacksburg, VA, USA. {wei,ryder}@cs.vt.edu

More information

GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime

GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime GDP: Using Dataflow Proerties to Accurately Estimate Interference-Free Performance at Runtime Magnus Jahre Deartment of Comuter Science Norwegian University of Science and Technology (NTNU) Email: magnus.jahre@ntnu.no

More information

Extracting Optimal Paths from Roadmaps for Motion Planning

Extracting Optimal Paths from Roadmaps for Motion Planning Extracting Otimal Paths from Roadmas for Motion Planning Jinsuck Kim Roger A. Pearce Nancy M. Amato Deartment of Comuter Science Texas A&M University College Station, TX 843 jinsuckk,ra231,amato @cs.tamu.edu

More information

42. Crash Consistency: FSCK and Journaling

42. Crash Consistency: FSCK and Journaling 42. Crash Consistency: FSCK and Journaling Oerating System: Three Easy Pieces AOS@UC 1 Crash Consistency AOS@UC 2 Crash Consistency Unlike most data structure, file system data structures must ersist w

More information

To appear in IEEE TKDE Title: Efficient Skyline and Top-k Retrieval in Subspaces Keywords: Skyline, Top-k, Subspace, B-tree

To appear in IEEE TKDE Title: Efficient Skyline and Top-k Retrieval in Subspaces Keywords: Skyline, Top-k, Subspace, B-tree To aear in IEEE TKDE Title: Efficient Skyline and To-k Retrieval in Subsaces Keywords: Skyline, To-k, Subsace, B-tree Contact Author: Yufei Tao (taoyf@cse.cuhk.edu.hk) Deartment of Comuter Science and

More information

A Petri net-based Approach to QoS-aware Configuration for Web Services

A Petri net-based Approach to QoS-aware Configuration for Web Services A Petri net-based Aroach to QoS-aware Configuration for Web s PengCheng Xiong, YuShun Fan and MengChu Zhou, Fellow, IEEE Abstract With the develoment of enterrise-wide and cross-enterrise alication integration

More information

Modeling Reliable Broadcast of Data Buffer over Wireless Networks

Modeling Reliable Broadcast of Data Buffer over Wireless Networks JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 3, 799-810 (016) Short Paer Modeling Reliable Broadcast of Data Buffer over Wireless Networs Deartment of Comuter Engineering Yeditee University Kayışdağı,

More information

Distributed Systems Multicast & Group Communication Services

Distributed Systems Multicast & Group Communication Services Distributed Systems 600.437 Multicast & Group Communication Services Department of Computer Science The Johns Hopkins University 1 Multicast & Group Communication Services Lecture 3 Guide to Reliable Distributed

More information

Relations with Relation Names as Arguments: Algebra and Calculus. Kenneth A. Ross. Columbia University.

Relations with Relation Names as Arguments: Algebra and Calculus. Kenneth A. Ross. Columbia University. Relations with Relation Names as Arguments: Algebra and Calculus Kenneth A. Ross Columbia University kar@cs.columbia.edu Abstract We consider a version of the relational model in which relation names may

More information

Mitigating the Impact of Decompression Latency in L1 Compressed Data Caches via Prefetching

Mitigating the Impact of Decompression Latency in L1 Compressed Data Caches via Prefetching Mitigating the Imact of Decomression Latency in L1 Comressed Data Caches via Prefetching by Sean Rea A thesis resented to Lakehead University in artial fulfillment of the requirement for the degree of

More information

Comparing IS-IS and OSPF

Comparing IS-IS and OSPF Comaring IS-IS and OSPF ISP Workshos These materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International license (htt://creativecommons.org/licenses/by-nc/4.0/) Last udated

More information

A Mechanism for Sequential Consistency in a Distributed Objects System

A Mechanism for Sequential Consistency in a Distributed Objects System A Mechanism for Sequential Consistency in a Distributed Objects System Cristian Ţăpuş, Aleksey Nogin, Jason Hickey, and Jerome White California Institute of Technology Computer Science Department MC 256-80,

More information

has been retired This version of the software Sage Timberline Office Get Started Document Management 9.8 NOTICE

has been retired This version of the software Sage Timberline Office Get Started Document Management 9.8 NOTICE This version of the software has been retired Sage Timberline Office Get Started Document Management 9.8 NOTICE This document and the Sage Timberline Office software may be used only in accordance with

More information

Near-Optimal Routing Lookups with Bounded Worst Case Performance

Near-Optimal Routing Lookups with Bounded Worst Case Performance Near-Otimal Routing Lookus with Bounded Worst Case Performance Pankaj Guta Balaji Prabhakar Stehen Boyd Deartments of Electrical Engineering and Comuter Science Stanford University CA 9430 ankaj@stanfordedu

More information

Truth Trees. Truth Tree Fundamentals

Truth Trees. Truth Tree Fundamentals Truth Trees 1 True Tree Fundamentals 2 Testing Grous of Statements for Consistency 3 Testing Arguments in Proositional Logic 4 Proving Invalidity in Predicate Logic Answers to Selected Exercises Truth

More information

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs An emirical analysis of looy belief roagation in three toologies: grids, small-world networks and random grahs R. Santana, A. Mendiburu and J. A. Lozano Intelligent Systems Grou Deartment of Comuter Science

More information

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing Mikael Taveniku 2,3, Anders Åhlander 1,3, Magnus Jonsson 1 and Bertil Svensson 1,2

More information

Optimizing Dynamic Memory Management!

Optimizing Dynamic Memory Management! Otimizing Dynamic Memory Management! 1 Goals of this Lecture! Hel you learn about:" Details of K&R hea mgr" Hea mgr otimizations related to Assignment #6" Faster free() via doubly-linked list, redundant

More information

An Efficient VLSI Architecture for Adaptive Rank Order Filter for Image Noise Removal

An Efficient VLSI Architecture for Adaptive Rank Order Filter for Image Noise Removal International Journal of Information and Electronics Engineering, Vol. 1, No. 1, July 011 An Efficient VLSI Architecture for Adative Rank Order Filter for Image Noise Removal M. C Hanumantharaju, M. Ravishankar,

More information

Continuous Visible k Nearest Neighbor Query on Moving Objects

Continuous Visible k Nearest Neighbor Query on Moving Objects Continuous Visible k Nearest Neighbor Query on Moving Objects Yaniu Wang a, Rui Zhang b, Chuanfei Xu a, Jianzhong Qi b, Yu Gu a, Ge Yu a, a Deartment of Comuter Software and Theory, Northeastern University,

More information

Distributed Estimation from Relative Measurements in Sensor Networks

Distributed Estimation from Relative Measurements in Sensor Networks Distributed Estimation from Relative Measurements in Sensor Networks #Prabir Barooah and João P. Hesanha Abstract We consider the roblem of estimating vectorvalued variables from noisy relative measurements.

More information

1.5 Case Study. dynamic connectivity quick find quick union improvements applications

1.5 Case Study. dynamic connectivity quick find quick union improvements applications . Case Study dynamic connectivity quick find quick union imrovements alications Subtext of today s lecture (and this course) Stes to develoing a usable algorithm. Model the roblem. Find an algorithm to

More information

Simulating Ocean Currents. Simulating Galaxy Evolution

Simulating Ocean Currents. Simulating Galaxy Evolution Simulating Ocean Currents (a) Cross sections (b) Satial discretization of a cross section Model as two-dimensional grids Discretize in sace and time finer satial and temoral resolution => greater accuracy

More information

12) United States Patent 10) Patent No.: US 6,321,328 B1

12) United States Patent 10) Patent No.: US 6,321,328 B1 USOO6321328B1 12) United States Patent 10) Patent No.: 9 9 Kar et al. (45) Date of Patent: Nov. 20, 2001 (54) PROCESSOR HAVING DATA FOR 5,961,615 10/1999 Zaid... 710/54 SPECULATIVE LOADS 6,006,317 * 12/1999

More information

Implementations of Partial Document Ranking Using. Inverted Files. Wai Yee Peter Wong. Dik Lun Lee

Implementations of Partial Document Ranking Using. Inverted Files. Wai Yee Peter Wong. Dik Lun Lee Imlementations of Partial Document Ranking Using Inverted Files Wai Yee Peter Wong Dik Lun Lee Deartment of Comuter and Information Science, Ohio State University, 36 Neil Ave, Columbus, Ohio 4321, U.S.A.

More information

Allowing the use of Multiple Ontologies for Discovery of Web Services in Federated Registry Environment

Allowing the use of Multiple Ontologies for Discovery of Web Services in Federated Registry Environment Allowing the use of Multile Ontologies for Discovery of Web Services in Federated Registry Environment Kunal Verma 1, Amit Sheth 2, Swana Oundhakar 3, Kaarthik Sivashanmugam 4, John Miller 3 1 Accenture

More information

A Novel Iris Segmentation Method for Hand-Held Capture Device

A Novel Iris Segmentation Method for Hand-Held Capture Device A Novel Iris Segmentation Method for Hand-Held Cature Device XiaoFu He and PengFei Shi Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200030, China {xfhe,

More information

CMSC 425: Lecture 16 Motion Planning: Basic Concepts

CMSC 425: Lecture 16 Motion Planning: Basic Concepts : Lecture 16 Motion lanning: Basic Concets eading: Today s material comes from various sources, including AI Game rogramming Wisdom 2 by S. abin and lanning Algorithms by S. M. LaValle (Chats. 4 and 5).

More information

Matlab Virtual Reality Simulations for optimizations and rapid prototyping of flexible lines systems

Matlab Virtual Reality Simulations for optimizations and rapid prototyping of flexible lines systems Matlab Virtual Reality Simulations for otimizations and raid rototying of flexible lines systems VAMVU PETRE, BARBU CAMELIA, POP MARIA Deartment of Automation, Comuters, Electrical Engineering and Energetics

More information

Ad Hoc Networks. Latency-minimizing data aggregation in wireless sensor networks under physical interference model

Ad Hoc Networks. Latency-minimizing data aggregation in wireless sensor networks under physical interference model Ad Hoc Networks (4) 5 68 Contents lists available at SciVerse ScienceDirect Ad Hoc Networks journal homeage: www.elsevier.com/locate/adhoc Latency-minimizing data aggregation in wireless sensor networks

More information

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems (ICE 601) Fault Tolerance Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability

More information

Learning Robust Locality Preserving Projection via p-order Minimization

Learning Robust Locality Preserving Projection via p-order Minimization Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Learning Robust Locality Preserving Projection via -Order Minimization Hua Wang, Feiing Nie, Heng Huang Deartment of Electrical

More information

Building Polygonal Maps from Laser Range Data

Building Polygonal Maps from Laser Range Data ECAI Int. Cognitive Robotics Worksho, Valencia, Sain, August 2004 Building Polygonal Mas from Laser Range Data Longin Jan Latecki and Rolf Lakaemer and Xinyu Sun and Diedrich Wolter Abstract. This aer

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka Parallel Construction of Multidimensional Binary Search Trees Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka School of CIS and School of CISE Northeast Parallel Architectures Center Syracuse

More information

Reliable Distributed System Approaches

Reliable Distributed System Approaches Reliable Distributed System Approaches Manuel Graber Seminar of Distributed Computing WS 03/04 The Papers The Process Group Approach to Reliable Distributed Computing K. Birman; Communications of the ACM,

More information

This version of the software

This version of the software Sage Estimating (SQL) (formerly Sage Timberline Estimating) SQL Server Guide Version 16.11 This is a ublication of Sage Software, Inc. 2015 The Sage Grou lc or its licensors. All rights reserved. Sage,

More information

Comparing IS-IS and OSPF

Comparing IS-IS and OSPF Comaring IS-IS and OSPF ISP Workshos Last udated 8 th Setember 2016 1 Comaring IS-IS and OSPF Both are Link State Routing Protocols using the Dijkstra SPF Algorithm So what s the difference then? And why

More information

A Robust Implicit Access Protocol for Real-Time Wireless Collaboration

A Robust Implicit Access Protocol for Real-Time Wireless Collaboration A Robust Imlicit Access Protocol for Real-Time Wireless Collaboration Tanya L. Crenshaw, Ajay Tirumala, Sencer Hoke, Marco Caccamo Deartment of Comuter Science University of Illinois Urbana, IL 61801 {tcrensha,

More information

CENTRAL AND PARALLEL PROJECTIONS OF REGULAR SURFACES: GEOMETRIC CONSTRUCTIONS USING 3D MODELING SOFTWARE

CENTRAL AND PARALLEL PROJECTIONS OF REGULAR SURFACES: GEOMETRIC CONSTRUCTIONS USING 3D MODELING SOFTWARE CENTRAL AND PARALLEL PROJECTIONS OF REGULAR SURFACES: GEOMETRIC CONSTRUCTIONS USING 3D MODELING SOFTWARE Petra Surynková Charles University in Prague, Faculty of Mathematics and Physics, Sokolovská 83,

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Using Permuted States and Validated Simulation to Analyze Conflict Rates in Optimistic Replication

Using Permuted States and Validated Simulation to Analyze Conflict Rates in Optimistic Replication Using Permuted States and Validated Simulation to Analyze Conflict Rates in Otimistic Relication An-I A. Wang Comuter Science Deartment Florida State University Geoff H. Kuenning Comuter Science Deartment

More information

Sage Document Management Version 17.1

Sage Document Management Version 17.1 Sage Document Management Version 17.1 User's Guide This is a ublication of Sage Software, Inc. 2017 The Sage Grou lc or its licensors. All rights reserved. Sage, Sage logos, and Sage roduct and service

More information

Visualization, Estimation and User-Modeling for Interactive Browsing of Image Libraries

Visualization, Estimation and User-Modeling for Interactive Browsing of Image Libraries Visualization, Estimation and User-Modeling for Interactive Browsing of Image Libraries Qi Tian, Baback Moghaddam 2 and Thomas S. Huang Beckman Institute, University of Illinois, Urbana-Chamaign, IL 680,

More information

22. Swaping: Policies

22. Swaping: Policies 22. Swaing: Policies Oerating System: Three Easy Pieces 1 Beyond Physical Memory: Policies Memory ressure forces the OS to start aging out ages to make room for actively-used ages. Deciding which age to

More information

New Techniques for Making Transport Protocols Robust to Corruption-Based Loss

New Techniques for Making Transport Protocols Robust to Corruption-Based Loss New Techniques for Making Transort Protocols Robust to Corrution-Based Loss Wesley M. Eddy NASA GRC / Verizon weddy@grc.nasa.gov Shawn Ostermann Ohio University ostermann@eecs.ohiou.edu Mark Allman ICSI

More information

Implementation of Evolvable Fuzzy Hardware for Packet Scheduling Through Online Context Switching

Implementation of Evolvable Fuzzy Hardware for Packet Scheduling Through Online Context Switching Imlementation of Evolvable Fuzzy Hardware for Packet Scheduling Through Online Context Switching Ju Hui Li, eng Hiot Lim and Qi Cao School of EEE, Block S Nanyang Technological University Singaore 639798

More information

SIMULATION SYSTEM MODELING FOR MASS CUSTOMIZATION MANUFACTURING

SIMULATION SYSTEM MODELING FOR MASS CUSTOMIZATION MANUFACTURING Proceedings of the 2002 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds.. SIMULATION SYSTEM MODELING FOR MASS CUSTOMIATION MANUFACTURING Guixiu Qiao Charles McLean

More information

Information Flow Based Event Distribution Middleware

Information Flow Based Event Distribution Middleware Information Flow Based Event Distribution Middleware Guruduth Banavar 1, Marc Kalan 1, Kelly Shaw 2, Robert E. Strom 1, Daniel C. Sturman 1, and Wei Tao 3 1 IBM T. J. Watson Research Center Hawthorne,

More information

in Distributed Systems Department of Computer Science, Keio University into four forms according to asynchrony and real-time properties.

in Distributed Systems Department of Computer Science, Keio University into four forms according to asynchrony and real-time properties. Asynchrony and Real-Time in Distributed Systems Mario Tokoro? and Ichiro Satoh?? Deartment of Comuter Science, Keio University 3-14-1, Hiyoshi, Kohoku-ku, Yokohama, 223, Jaan Tel: +81-45-56-115 Fax: +81-45-56-1151

More information

search(i): Returns an element in the data structure associated with key i

search(i): Returns an element in the data structure associated with key i CS161 Lecture 7 inary Search Trees Scribes: Ilan Goodman, Vishnu Sundaresan (2015), Date: October 17, 2017 Virginia Williams (2016), and Wilbur Yang (2016), G. Valiant Adated From Virginia Williams lecture

More information