HARTs: High Availability Cluster Architecture with Redundant TCP Stacks*

Size: px
Start display at page:

Download "HARTs: High Availability Cluster Architecture with Redundant TCP Stacks*"

Transcription

1 HARTs: High Availability Cluster Architecture with Redundant TCP Stacks* Zhiyuan Shao, Hai Jin, Bin Chen, Jie Xu, and Jianhui Yue Huazhong University of Science and Technology, Wuhan, , China hj inahus t. edu. cn Abstract Improving the availability of services of is a key issue for survivabi/ity of a cluster system. Lots of schemes are proposed for this purpose. But most of them aim at enhancing only the service-level availability or application specijk. In this paper, we propose a scheme called High Availability with Redundant TCP Stacks (HARTS), providing connection-level availability by exploring the redundant TCP stacks for TCP connections at the server side. We present our performance experiment results on our HA cluster prototype. From results, we find the configuration of one primary sewer with one backup server running on separated loombps Ethernet has acceptable performance to support the server side applications while delivering high availability. 1. Introduction With the popularity of using clusters of computers built of COTS components instead of high-end supercomputers, more and more efforts need to be done to solve the reliability problems of the cluster system. For a typical TCP connection with one client and one server [3], the failure of a TCP connection can be either the failure of the client connection or the failure of the server connection. The former one can be solved by increasing the reliability of the client s computer or by simply restarting the client. The TCP connections at server side are always shared by multiple clients simultaneously. The failure of a TCP connection for a busy server will not be so simple as restart of the server. The most ideal way to implement the high availability of a TCP connection is socket migration. If we can migrate the socket connection at server side when the server fails, the availability of TCP connection can be improved. Transparent migration of the sockets associated with a migrating process has been done inefficiently in previous This work is supported by National Defense Advanced Research Project under grant works. This is due to the complexity of the connection status and the huge cost involved replacing the full-grown IPv4 architecture 173. Although running replicas of a same task can be easily implemented by putting the replica of the original task into redundant computer components and let them run. To build the replicas of a process with communication, we face the problems such as how to replicate the communications. Multicasting [19] is one choice, but the source code of the process should be largely modified. This is not acceptable if the cost is inhibitive or the source code is not available. As TCP connection is strictly one-to-one connection. To make multiple replicas of a TCP connection means that we have to deal with the communication semantic with one-to-multiple paradigm. In this paper, we propose a scheme called High Availability with Redundant TCP Stacks (HARTS) to improve the availability of TCP connections. With HARTs, at least two server nodes working in active-active mode, maintaining the redundant TCP connection stacks. Any failure in one of the server nodes will not interrupt the TCP connections. As the servers working in active-active mode, there is no need to perform checkpointing during the fail-free. In case of server failure, there is no need to perform socket migration to rescue the TCP connection, as a redundant TCP connection can continue working without takeover time. HARTs also makes the redundant TCP connections as single networking. There is no need to modify any code of user application at both the client side and the server side. Furthermore, no efforts need to be done to modify operating system kernel at the client side. In section 2 we briefly give some related works to enhance the availability or fault tolerance of the connections. The architecture of HARTs is described in section 3. Details of our HARTs system are elaborated in section 4. Experiment results are presented and discussed in section 5. We end our paper with conclusions. 2. Background FT-TCP [I] is a scheme for transparent recovery of a /03/$ IEEE 255

2 crashed process with open TCP connections. A wrapper around the TCP layer intercepts and logs reads by the process for replay during recovery, and shields the remote endpoint From failure. Although the possibility of recovery from a connection failure is provided in this scheme, the recovery requires backup all the logs related to the connection for the reinstating of the server side s TCP stack. As an uncertain span of time is required for the replaying, FT-TCP has to be limited as a method of providing fault tolerance not high availability. In a system with implemented FT-TCP, each packet belonging to a connection should be backup in a logger. To confirm the success of logging, responses cannot be sent out until a message indicating the accomplishment of logging is received. In this way, the fault tolerance is guaranteed. But the communication bandwidth is largely sacrificed. Also, the logger becomes a new single point of failure in the system. Fault Resilience in web-server cluster [IS] proposed a solution of achieving fault resilience of the clusters dedicated to provide all kinds of web services. In their system, web services are cataloged as stateless request for web surfing and stateful requests for session based services. The corresponding methods of implementing fault resilience are proposed. This project comfortably realized fault resilience in the web-server clusters, but it is very specified and limited to the scope of the clusters, which designed to be web-servers. The twin server protocol proposed in their scheme introduces an unpredictable time span before the final result of a client s request sent out. The longer the client s stateful request, the longer the client should wait before he receives the response. Fine-grained failover using connection migration is a failover architecture that enables migrating H connections within a pool of servers [SI. It replicates per-connection soft transport and application state. The connection migration is initiated only by the new server in the case such as failure or overload of origin server [14]. The architecture adds a HTTP-aware module at the transport layer that extracts infomation from the application data stream to be used for connection resumption. The design also depends on the infoxmation in the application data stream. Socket Cloning [lo] is a scheme proposed for load-balancing the workload of the server nodes in a cluster dedicated for web services. The key idea is to migrate workload of a heavy-loaded server node to a lighter-loaded server node or where the cached replication of the requested document resides by cloning the socket. After cloning, the incoming packets of a request are firstly sent to the original server node and then to the server node where the cloned socket resides by a mechanism resembling TCP splicing [6]. Responses generated by the cloned socket are directly sent out by using TCP handofl 12][8). It is the cloned socket that deals with the subsequent requests while the original node keeps track with the status changing. By using this scheme, the workload can be balanced and the communication overhead incurred by transferring the cached blocks by using cooperative caching is minimized. As the original node does nothing after socket cloning except for being aware of status changing, the cloned socket is also a single point of failure although high reliability is not the key issue in web services. 3. Connection-based High Availability Cluster Architecture In a classic cluster architecture, a portal server lies between the outside clients and the inside real servers. All connections from the clients are firstly sent to the portal rather than directly to the real server nodes that provide services. After having received these incoming connections, the portal server will deliver them among the real server nodes by using some load-balancing algorithms. Software packages, such as LVS [ 161, used on the portal node are also an active research topic. The real server nodes deal with the incoming service requests. If a real server node crashes, the portal node gets this information by periodically diagnosing their status. The following requests will not be delivered to the crashed server node until it is recovered. By this scheme, classic cluster provides high availability to its services. But the connections connected to the real server node before the crash is simply lost. The granularity of availability a classic cluster provides is the services not the connections. And by using these clusters to achieve high reliability, most of the schemes are worked in an active-standby mode where time cost in failover is always very high. In order to satisfy the connection-based availability granularity for cluster system and reduce the failover time, we proposed new high availability cluster architecture, illustrated in Figure 1. The key objective of our scheme is that the primary and the backups are working in an active-to-active mode. If any of these component fails, an identical replica will continue the work without or with little stop. For achieving this objective, there must be a server node (any Backup Servers in Fig. 1) should take over the server node (Primary Server in Fig.1) that actually make connections with the outside world in case of its failure. This means the connection state of the Primary Server must be restored. Each of the computer nodes in HA cluster have two NICs, etho and ethl. There are also two logically 256

3 independent networks, the private networ& and the public network. All the ethl of computer nodes are connected to the private network, whereas all the etho to the public one. Ftiwitemwxk Rivate Network it' FUbliCIWWk Figure 1 HA Cluster Architecture The IP addresses bound to the ethls are private IP addresses belonging to a same subnet. The major functionality of this private network is to provide a channel for the intemal communications among cluster nodes. Regarding the eth0, although they are all physically connected to the public network, only the etho of the primary server is activated. Its IP address is visible to the Intemet and can be uniquely addressed. The etho of the other server nodes (the Backup Servers) are all inactivated during the fail-free run. Actually, all the Backup Servers in the HA cluster except for the Primary Server are not isolated from the outside world. As they regard the Primary Server as their gateway, they can actively make their own connections with the outside world. When a connection between a client and the cluster is initiated, the packets belongs to this connection are delivered to the Backup Servers by the Primary Server via the private network. All the responses generated by the Backup Servers are sent back to the Primary Server. During the period of the connection, the Primary Server will filter all the unnecessary responses and guarantee just one response from the cluster. If the Primary Server crashed while the connection is still on the fly, one of the Backup Servers will be elected, without losing generality, say it Backup Serverl in Fig. 1. The elected Backup Server will strive to establish its role as the new Primary Server. During the establishment, the ethl IP Address of Backup Serverl will change to be the ethl IP address of the crashed Primary Server. The etho of Backup Serverl will be brought up, and the IP address also changes to that of etho of old Primary Server. This action can be accomplished by IP faking. After IP faking, the new Primary Server is established as it is shown in Figure 2. To simplify our discussion, we assume that after receiving the data stream from the client, all of the servers in the cluster will elicit identical responses. This assumption can be justified by most of the applications running in the cyberspace. 4. Priniciple of Redundant TCP Stacks We now discuss the principle of redundant TCP stacks from two aspects. In the case of system in the fail-free status, we focus our discussion on the principle of redundant TCP stacks about how to maintain the normal connections. While for the failover situation, we discuss how the connections are retained and continue their operations while any of the servers in HA cluster fails. 4.1 Scenario 1: Fail-free With redundant TCP stacks, there are multiple servers in a HA cluster system participating one side of a same connection. The first thing we need to address is to maintain connection while it can be regarded as normal connection from both of the viewpoints of the backups and the primary. The connection status must be synchronized in these multiple units. This issue includes how to synchronize the sequence numbers and how to synchronize the operations Synchronizing the TCP sequence numbers. Considering a typical TCP connection, when a client wants to establish a TCP connection with a remote server, he sends out a SYN packet with a client-side Initial Sequence Number (ISN) J. After receiving the SYN packet from client, the server replies back a SYNBrACK packet which contains the server-side ISN K and acknowledge number J+1. These are the first two steps of the 3-way handshaking mechanism to establish a TCP connection. After that, all the packets belonging to this connection are tagged with a sequence number with offset from K and J. Otherwise, the TCP stacks will discard the packet. In our HA cluster system, we use multiple independent server nodes instead of one autonomous server. After receiving the SYN packet, all the server nodes should response to establish a connection. They also need to be synchronized during the remaining communications. The first issue we need to address is how to synchronize the sequence numbers used by the server nodes for a specific connection. That is, if the primary node crashed during the 257

4 connection, one of the backup nodes who takes over the role of the primary node should send out packets tagged with the sequence number originated from the ISN generated by the primary node. Only by this way, the packets sent out by this new primary node can be recognized by the TCP stacks of the clients. An intuitive method is to record the ISN generated by the primary server as well as the offset from the sequence number of the latest packet to that EN, backup these values periodically to the backup servers. If the primary server crashed, its heir uses the backup record to modify its outgoing packets. Although this method fits the sequence number control logic of the TCP stacks, unnecessary communication overhead is introduced and the determination of the time span between the backup server nodes remains a big problem. We solve this problem by rendering the TCP stacks of the servers some modifications. The SYN packet from a client initializing a connection contains just an ISN and either none or a meaningless acknowledge number. When capturing this packet firstly by the primary server, we fill the unused field of acknowledge number with a secure sequence number which has no confliction with other connections and modify the reserved field in the packet with a specific flag K. The modified SYN packet is then delivered to the both primary server and the backup servers with modified TCP stack. After receiving this SYN packet, the modified TCP stacks use the secure sequence number as the ISN for the later communications. This procedure is illustrated in the Figure 3. Figure 3 Synchronizing TCP Sequence Numbers Using this scheme, no hrther communications between the primary and backup servers for the sequence number synchronization are required. The elected backup node as new primary server node can simply send out its packet with correct TCP sequence numbers Synchronizing the operations. A simply solution is to permit all the packets created by the primary node and relay all the packets generated by the backup nodes without discrimination and processing. This solution has two problems. Firstly, the public network in our HA cluster system will be overloaded. Secondly, as the server nodes process their communication independently, the faster server node (incurred by the its lighter workload or fast processing ability for heterogeneous nodes) will make the client runs faster. Thus, the slower server node will be chocked because it could not keep pace with the status change of the connection. That is, in order to synchronize TCP stacks, the communication speed is determined by the slowest node not the fastest. In our scheme, a synchronization layer is deployed between the TCP stacks and the outside world. After receiving the SYN packet from a client, the connection information is stored in a hash table item by a hash function with client s IP address, port number, and the server port number. Each server node taking part in this connection is added to this item as a linked list. The data structure denoting server node in the linked list contains several important fields, which are critical to our algorithm. max-seq-num is the maximum sequence number of the packets sent by the node; max-ack-num is the maximum acknowledge number the node issued; window-size is the current advertised window number. When the response of one server node reaches the synchronization layer, the connection it belongs to is addressed and the server node structure is found subsequently by matching the IP addresses and port numbers it contains. Whether this packet should be relayed or not is decided by an algorithm, called send when minimum updated (SWMU). For the implementation purpose of S W algorithm, the hash table item must contain following fields: t-ma-seq-num is the maximum sequence number among all the server nodes; r-min-seq-num is the minimal sequence number of all the nodes; t-ma-ack-num is the maximum acknowledge number; t-min-ack-num is the minimal acknowledge number, t-window-size is the minimal advertised window number the server nodes currently advertised. When a packet arrives from a server node, the max-seq-num field of that node structure is updated. This action incurs redetermination of the t-max-seq-num and t-mindeq-num of the hash table. If and only if t-min-seq-num is updated, the packet could be sent out with some modifications. Figure 4 give an example of 4 server nodes for the SWMU algorithm. We assume that their initial sequence number are all K, which means t-ma-seq = t-minseq = K initially. Packet1 from node1 comes first, and it set max-seq-num field to be KI. The t-max-seq-num field is also updated to be Kl. Then, packet2 and packet4 arrive, and the length of these 2 packets is longer (K2=KOKI). 258

5 Then, t-max-seq-num is update to be K2. The t-mindeq-num of this connection is NOT updated when all these 3 packets arrive. Thus, all of them will not be relayed. But the arrival of packet3, which belongs to node3, will make things different. It will update t-min-seq-num from K to KI (Kz>K3>KI). L2 IC packet1 packet2 I packet3 packet4 1 K L K- K- K- nodel node2 node3 node4 Figure 4 An Example of SWMU Algorithm According to their arrival time, we know that the updated time for t-min-seq-num is the time when the slowest server node responses. And it is also the time to deliver the packet. In the above example, simply relaying packets is also improper. Packet3 should be firstly trimmed to be ended at K1, and the outgoing packet must undergoing NAT [ 171 before relaying out. There still remains an exception. When a server pulls data in, there is no updating of t-mindeq-num. On the contrary, the t-min-ack-num undertakes the role of t-mindeq-num and the outgoing packets contains the minimal acknowledge number. We still omitted something in our discussion above. The HA cluster is consisted of many servers. Although receiving data from some client is their common requirements, which already be comfortably fulfilled by Redundant TCP Stack technology, in some configurations, different servers are also assigned to different tasks. There are two ways for the servers to advance their individual private communications. Although in our HA cluster, only the primary server has its etho active and connected to the public network, the back-end servers also have the ability to communicate with the clients while regarding the primary server are their gateway. That means, all the servers in the HA cluster can make connections with the clients to transfer data for their own tasks. If the primary server crashed during the lifetime of these private connections, these connections will not lose as TCP is robust enough and the newly elected primary server will change the IP address of its ethl to be that of the dead primary server and have its etho brought up. It is also possible for the servers to have the requirement of transferring data with some client using the same connection, which is already synchronized by our synchronizing layer. In this case, we can design some regulations in the synchronizing layer to make it fit these requirements. For example, if the whole HA cluster runs in a Pull mode in the synchronized connection with the client, to synchronize the connection, we can simply control the ACK numbers of the outgoing packets and leave the data in these outgoing packets unsynchronized. 4.2 Scenario 2: Failover In this part we will describe how the connections can be retained when the primary server crashed, and how the heir of the primary restore all the information for the connections Fault detection. In order to achieve the failover, the first thing we have to do is to determine which server node is undergoing a failure. Heartbeating [9] is the most common way to detect failure in cluster system. But this scheme is in most cases only very useful in hardware failures. We build up fault diagnose module in the software running in these servers. They are very simple and efficient so that when any software fault is detected, they can report to the synchronizing layer and then it is the synchronizing layer who can decide what to do the next. In our HA cluster system, if the dead unit is a backup server, there is no influence to the primary server. If the primary server is failed, a new primary server must be elected. To elect one live backup server from the multiple ones, there are many election algorithms can be employed [4][11]. For simplification, we use election algorithm in our system to find out a live server having the minimal last byte of IP address as the new primary server node. As there are so many proposals and implementations in the field of fault detection and this is not the main point in our paper, we will not discuss this topic in the very detail Connection failover. The traditional way to achieve failover is the check-pointing technique. The critical information of primary server is stored in the backup servers. When the primary fails, one of the backups will continue the service from the latest checkpoint. In our scheme, to avoid harming the fail-free performance and reduce failover time, we do not backup any information during normal operation. We use the following way in our scheme to decide from where the elected back-end node continues. When the HA cluster system works in the fail-free mode, the primary server knows the status of all the backup servers, but any of the backup servers knows nobody except itself. The only thing the back-end PU can record is its own connection status during the operation. We use an example to elaborate how the connections failover to a backup server. During the fail-free run, we assume there is a backup server, node2, who is eligible to be a new primary. We will describe what happened after the failure of primary server and how node2 takes up the 259

6 role of the new primary. The only information node2 has is how long it runs. When primary server fails and the newly elected primary node becomes node2, node2 have to know the information of the other living nodes. By using IP address faking, the IP address bound to the NICs of node2 is modified after the election. The automatically retransmitted packets by the TCP stacks of the other living nodes will arrive node2 after the election. Thus, it is very easy for node2 to obtain the connection status information of the living others. It uses SWMU algorithm to decide which packet needs to be relayed by comparing all the packets. The failure of any backup server will not cause any election or any change in the cluster system. If a backup server fails, the rest server nodes in the cluster system simply drop down the dead server's connection information. But if the slowest backup server crashed during the operation, and the packet of this backup server is not sent out, there is still a problem. Even though the other servers are still in operation, the whole cluster system comes into a status of malfunction. This is incurred by SWMU algorithm, as the packet with minimal sequence number will never arrive in this situation. To solve this problem, the primary server has to decide carefully when to drop a connection information structure of a dead server especially as it belongs to the slowest server one. In our scheme, we first decide whether a server owns the minimal sequence number when the information of a backup server is to be dropped. If not, the connection information is simply dropped. Otherwise, the status must be recorded first and then, the primary server selects a retransmitted packet from the other nodes with the latest sequence number the deleted server had. 5. Performance Evaluation We use Netpipe-2.4 [ll] as our benchmark to test our HARTS system. Each node in our cluster system is Pentium XI1 450 CPU with 128MB memory and dual loombps NICs. The operating system is Redhat Linux with a kernel version To test the performance of individual TCP connection, both ends of a TCP connection in Netpipe work in a ping-pong model, one receives after one sends and vice versa in the other end of connection. The size of the packets used in these experiments starts from an intialvalue. After a predefined looptimes of these experiments, the packet size increased by an increament. The loops continue until the packet size reaches a predefined uppervalue. In our experiment, we define the intialvalue to be 1 byte, the increament to be 256 bytes, the uppervalue to be 8193 bytes, and the looptimes to be 400. The throughput for each specific size of packet is the mean value of the throughput of all experiments, and the final result is obtained by 3 times of repeated testing. The number of backup servers in the HA cluster system can be varied. With different number of the backup servers, the performance results are also different. We use 1P to represent the configuration of one primary server, lplb one primary server plus one backup server, 1P2B one primary server plus two backup servers and 1P3B one primary server and three backup servers. Our experiments are conducted under two different network configurations. First, we connect all the NICs of the servers and the client to a single loohfbps hub. We call this configuration 100M-Shared. Second, we construct a private network with looh4bps hub, and the public network with a loombps Switch. We call this configuration 100M-Separated. In our experiment platform, about 101 units of the CPU time equal to 1 second of our real world. This value applies to all figures related to the CPU time in this paper M-Shared Experiments and Results Figure 5 shows the throughputs under different cluster configurations. In Fig.5, there exist a sudden decline in the throughput curves when the packet size is 1537 and This decline is introduced by IP fragment and resembling mechanisms. As our tests are conducted in Ethemet with MTU of 1500, a packet sized either 1537 or 3073 contains a small IP packet when fragmented. It is this small IP packet that degrades the throughput. 70 I 60 h B 50 0 U- " " " " " ' "" " " ' " " " " " I Packet Size(Bvteb +lp *lplb +1P2B -1P3B Flgure 5 Throughput of HA Cluster under 100M-Shared Configuration In the configuration of one primary and one backup, the throughput degrades from that of one server configuration. With the increasing number of server nodes in cluster, the network throughput decreases. The performance loss dues to the latency introduced by sequence number synchronization and our SWMU algorithm. The network saturate degree also has a side effect to the performance. 260

7 With the increasing of the number of the servers, these two reasons of performance loss become substantial. For the configurations of multiple server nodes in our cluster, with the increasing packet size, the throughput improvement becomes smaller, and the final result is a fairly stable one compared with one server configuration. This is because with the increased server nodes, communication overhead tums heavier, and the saturate degree of the shared network increases. By using tcpdump, the possibility of packet retransmission from the backup servers also increase, and this puts more burden to the network. As the result, the throughput gain by transferring larger packets is buried by the network overhead. We measured the server node idle CPU time to justify whether the server nodes is busier or not. As the idle CPU time in a single ping-pong test is so small to be compared, we summed all the idle CPU time in the loop of 400 times of ping-pong tests. Figure 6 illustrates the test results for these experiments. server. Thus, in the cluster configuration of lplb, the primary node will be busier than one single server node while the backup node will work in almost the same CPU overload as that of one single server node. By adding more server nodes to the cluster, the primary server node becomes busier and no change for backup nodes. It is due to the SWMU algorithm that incurs this heavier CPU workload M-Separated Experiments and Results Figure 7 shows the throughput results under different cluster configurations. The characteristics of results obtained in Fig. 5 are still applicable to Fig. 7. But the performances of the clusters with one or more backup servers with 1 OOM-seperated configurations are better than those with 100M-Shared configurations. 70 r 60 so c - a i C t i C I C s r- ^, * - +Primary ++ BackenfdlB/ 1P + Backend 1P2B - Backend 1P3B Figure 6 Servers Idle CPU Time under 1 OOM-Shared Configuration In our experiment, the status of different backup servers in the same primary-backup configuration is identical. Thus, only one curve is shown to represent the status of all the backup server configurations. In Fig. 6, with the increasing of server nodes, the idle CPU time in both primary servers and backup servers increases. The throughputs gained by using multiple server nodes are much lower than by using single autonomous server node. This means that the cluster of multiple server nodes will need more time than a single server while transferring the data of same size. We must consider both the total idle CPU time and the throughput rate. Considering the configuration of lplb, the throughput is nearly half of using single server. The time spent by transferring the same size of data is doubled than that spends by one single server. At the same time, the CPU idle time of the primary node is less than the twice of CPU idle time of single server, the CPU idle time of the backup node is very close to the twice of CPU idle time of single Figure 7 Throughput of HA Cluster under 100M-Separated Configuration The gap of the performances between the 1 OOM-seperated configurations and the 100M-shared configurations tells us how much the network saturation burdens the throughput, as it dose not exist in the public network in the 100M-seperated configurations. The experiment results of server idle CPU time under different cluster configurations are presented in Figure 8. The same patterns are applied as that of the 100M-Shared configuration. But as the throughput is increased, both primary servers and backup servers become busier. 6. Conclusions In this paper, we discussed the redundant TCP stacks and the related innovative cluster architecture. By using these techniques, the traditional one to one paradigm of TCP connection is changed and connection gratuity high available services can be obtained. We also implemented a prototype. Experiments on this prototype under various 26 1

8 cluster configurations are performed. 1 1 Figure 8 Servers Idle CPU Time under 1 OOM-Separated Configuration In the experiments, we observed that the retransmitted packets from the back-end processing units put heavy burden on both the network and the CPU of the primary server. If we can filter the meaningless retransmission, the network throughput will be improved. The performance of our cluster system in the current status is not so optimal. By carehlly measure the network throughput and the CPU utilizations we advocate that a specially designed router or a dedicated computer is needed as a portal so that the software of synchronizing layer and SWMU can be put on them. At the same time, the availability of the portal can be guaranteed from hardware or by another backup. References L. Alvisi, T. C. Bressoud, A. El-Khashab, K. Matzullo, and D. Zagorodnov, Wrapping Server-Side TCP to Mask Connection Failures, Proceedings of INFOCOM M. Aron, D. Sanders, P. Druschel, and W. Zwaenepoel, Scalable content-aware request distribution in cluster-based network servers, Proceedings of the USENLY 2000 Annual Technical Conference, June K. Ghose, A Comparative Study of Some Network Subsystem Organizations, Proceedings of the IEEE 1998 International conference on High Performance Computing (HiPCPB), pp Y. Huang and P. K. McKinley, Group leader election under link-state routing, Proceedings of the IEEE Internation Conference on Network Protocols, Atlanta, Georgia, October 1997 D. Maltz and P. Bhagwat, MSOCKS: An architecture for transport layer mobility, Proceedings of IEEE Infocom 98, Mar D. Maltz and P. Bhagwat, TCP slipcing for application layer proxy performace, IBM Research Report 21139, IBM Research Division, R. Nasika and P. Dasgupta, Transparent migration of distributed computing processes, Proceedings of Thirteenth International Conference on Parallel and Distributed Computing Systems. V. S. Pai, M. Aron, G. Banga, M. Svendsen, P. Dmschel, W. Zwaenepoel, and E. Nahum, Locality-aware request distribution in cluster-based network servers, ACM SIGPLANNotices, Vo1.33, No.11, pp , Nov Robertson, Linux-HA Heartbeat System Design, Proceedings of 2000 ALS conference. [lo] Y.-F. Sit, C.-L. Wang, and F. Lau, Socket Cloning for Cluster-Based Web Server, Proceedings of IEEE Fourth International Conference on Cluster Computing, Chicago, USA, September 23-26,2002. [ll] S. Singh, J.F. Kurose, Electing Good Leaders in Distributed Systems, Journal of Parallel and Distributed Systems,Vo1.23,1994, pp [12] Q. 0. Snell, A. Mikler, and J. L. Gustafson, Netpipe: A Network Protocol Independent Performace Evaluator, Proceedings of IASTED International ConJerence on Intelligent Information Management and Systems, June [13] C. Snoeren, D. G. Andersen, and H. Balakrishnan, Fine-Grained Failover Using Connection Migration, Proceedings of 3rd USENLY Symposium on Internet Technologies and Systems (USITS), [14] F. Sultan, K. Srinivasan, and L. Iftode, Transport Layer Support for Highly-Available Network Services, Proceedings of HotOS-VIII, May [15] C.4. Yang and M.-Y. Luo, Realizing Fault Resilience in Web-Server Cluster. Proceedings of the 13th ACM/IEEE Conference on High PerJonnance Networking and Computing (SC 2000). [16] W. Zhang, Linux Virtual Server for Scalable Network Services, Proceedings of Ottawa Linux Symposium [17] D. L. Herbert, S. S. Devgan, C. Beane, Application of network address translation in a local area network, Proceedings of the 33rd Southeastern Symposium on System Theory, 2001,pp [18] Y. Rekhter and P. Gross, Application of the Border Gatewuy Protocol in the Internet, RFC 1268, Oct [19] S. Johnson, F. Jahanian, Experiences with group communication middleware, Proceedings of the International Conference on Dependable Systems and Networh (DSN2000). 2000, pp

Engineering Fault-Tolerant TCP/IP servers using FT-TCP. Dmitrii Zagorodnov University of California San Diego

Engineering Fault-Tolerant TCP/IP servers using FT-TCP. Dmitrii Zagorodnov University of California San Diego Engineering Fault-Tolerant TCP/IP servers using FT-TCP Dmitrii Zagorodnov University of California San Diego Motivation Reliable network services are desirable but costly! Extra and/or specialized hardware

More information

Migratory TCP (MTCP) Transport Layer Support for Highly Available Network Services

Migratory TCP (MTCP) Transport Layer Support for Highly Available Network Services Migratory TCP (MTCP) Transport Layer Support for Highly Available Network Services DisCo Lab Division of Computer and Information Sciences Rutgers University Nov. 29, 2001 CONS Light seminar 1 The People

More information

TCP Server Fault Tolerance Using Connection Migration to a Backup Server

TCP Server Fault Tolerance Using Connection Migration to a Backup Server TCP Server Fault Tolerance Using Connection Migration to a Backup Server Manish Marwah Shivakant Mishra Department of Computer Science University of Colorado, Campus Box 0430 Boulder, CO 80309-0430 Christof

More information

Socket Cloning for Cluster-Based Web Servers

Socket Cloning for Cluster-Based Web Servers Socket Cloning for Cluster-Based s Yiu-Fai Sit, Cho-Li Wang, Francis Lau Department of Computer Science and Information Systems The University of Hong Kong E-mail: {yfsit, clwang, fcmlau}@csis.hku.hk Abstract

More information

Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues

Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues By Narjess Ayari, Denis Barbaron, Laurent Lefevre and Pascale primet Presented by Mingyu Liu Outlines 1.Introduction

More information

Transparent TCP Recovery

Transparent TCP Recovery Transparent Recovery with Chain Replication Robert Burgess Ken Birman Robert Broberg Rick Payne Robbert van Renesse October 26, 2009 Motivation Us: Motivation Them: Client Motivation There is a connection...

More information

Implementation and Analysis of Large Receive Offload in a Virtualized System

Implementation and Analysis of Large Receive Offload in a Virtualized System Implementation and Analysis of Large Receive Offload in a Virtualized System Takayuki Hatori and Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN {s1110173,hitoshi}@u-aizu.ac.jp Abstract System

More information

Increasing Availability of Linux System using Redundancy. Beob Kyun Kim

Increasing Availability of Linux System using Redundancy. Beob Kyun Kim Increasing Availability of Linux System using Redundancy Beob Kyun Kim (kyun@etri.re.kr) OUTLINE I. Availability? II. Efforts to increase availability using redundancy History on Virtual Routers UCARP

More information

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin,

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin, Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin, ydlin@cs.nctu.edu.tw Chapter 1: Introduction 1. How does Internet scale to billions of hosts? (Describe what structure

More information

SJTU 2018 Fall Computer Networking. Wireless Communication

SJTU 2018 Fall Computer Networking. Wireless Communication SJTU 2018 Fall Computer Networking 1 Wireless Communication Internet Protocol Stack 2 Application: supporting network applications - FTP, SMTP, HTTP Transport: data transfer between processes - TCP, UDP

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journal of Systems and Software 81 (28) 244 258 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Design and implementation

More information

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 2015 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 1 Question 1 Why did the use of reference counting for remote objects prove to be impractical? Explain. It s not fault

More information

EEC-682/782 Computer Networks I

EEC-682/782 Computer Networks I EEC-682/782 Computer Networks I Lecture 15 Wenbing Zhao w.zhao1@csuohio.edu http://academic.csuohio.edu/zhao_w/teaching/eec682.htm (Lecture nodes are based on materials supplied by Dr. Louise Moser at

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

CS 716: Introduction to communication networks th class; 7 th Oct Instructor: Sridhar Iyer IIT Bombay

CS 716: Introduction to communication networks th class; 7 th Oct Instructor: Sridhar Iyer IIT Bombay CS 716: Introduction to communication networks - 18 th class; 7 th Oct 2011 Instructor: Sridhar Iyer IIT Bombay Reliable Transport We have already designed a reliable communication protocol for an analogy

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

User Datagram Protocol

User Datagram Protocol Topics Transport Layer TCP s three-way handshake TCP s connection termination sequence TCP s TIME_WAIT state TCP and UDP buffering by the socket layer 2 Introduction UDP is a simple, unreliable datagram

More information

HUAWEI USG6000 Series Next-Generation Firewall Technical White Paper VPN HUAWEI TECHNOLOGIES CO., LTD. Issue 1.1. Date

HUAWEI USG6000 Series Next-Generation Firewall Technical White Paper VPN HUAWEI TECHNOLOGIES CO., LTD. Issue 1.1. Date HUAWEI USG6000 Series Next-Generation Firewall Technical White Paper VPN Issue 1.1 Date 2014-03-14 HUAWEI TECHNOLOGIES CO., LTD. 2014. All rights reserved. No part of this document may be reproduced or

More information

MIGSOCK A Migratable TCP Socket in Linux

MIGSOCK A Migratable TCP Socket in Linux MIGSOCK A Migratable TCP Socket in Linux Bryan Kuntz Karthik Rajan Masters Thesis Presentation 21 st February 2002 Agenda Introduction Process Migration Socket Migration Design Implementation Testing and

More information

Wrapping Server-Side TCP to Mask Connection Failures

Wrapping Server-Side TCP to Mask Connection Failures IEEE INFOCOM 2001 1 Wrapping Server-Side TCP to Mask Connection Failures Lorenzo Alvisi, Thomas C. Bressoud, Ayman El-Khashab, Keith Marzullo, Dmitrii Zagorodnov Abstract We present an implementation of

More information

Mobile Transport Layer

Mobile Transport Layer Mobile Transport Layer 1 Transport Layer HTTP (used by web services) typically uses TCP Reliable transport between TCP client and server required - Stream oriented, not transaction oriented - Network friendly:

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

Mobile Communications Chapter 9: Network Protocols/Mobile IP

Mobile Communications Chapter 9: Network Protocols/Mobile IP Mobile Communications Chapter 9: Network Protocols/Mobile IP Motivation Data transfer Encapsulation Security IPv6 Problems DHCP Ad-hoc s Routing protocols 9.0.1 Motivation for Mobile IP Routing based on

More information

Load Balancing Technology White Paper

Load Balancing Technology White Paper Load Balancing Technology White Paper Keywords: Server, gateway, link, load balancing, SLB, LLB Abstract: This document describes the background, implementation, and operating mechanism of the load balancing

More information

Table of Contents. Cisco Introduction to EIGRP

Table of Contents. Cisco Introduction to EIGRP Table of Contents Introduction to EIGRP...1 Introduction...1 Before You Begin...1 Conventions...1 Prerequisites...1 Components Used...1 What is IGRP?...2 What is EIGRP?...2 How Does EIGRP Work?...2 EIGRP

More information

Connectionless and Connection-Oriented Protocols OSI Layer 4 Common feature: Multiplexing Using. The Transmission Control Protocol (TCP)

Connectionless and Connection-Oriented Protocols OSI Layer 4 Common feature: Multiplexing Using. The Transmission Control Protocol (TCP) Lecture (07) OSI layer 4 protocols TCP/UDP protocols By: Dr. Ahmed ElShafee ١ Dr. Ahmed ElShafee, ACU Fall2014, Computer Networks II Introduction Most data-link protocols notice errors then discard frames

More information

Chapter 17: Distributed Systems (DS)

Chapter 17: Distributed Systems (DS) Chapter 17: Distributed Systems (DS) Silberschatz, Galvin and Gagne 2013 Chapter 17: Distributed Systems Advantages of Distributed Systems Types of Network-Based Operating Systems Network Structure Communication

More information

CC-SCTP: Chunk Checksum of SCTP for Enhancement of Throughput in Wireless Network Environments

CC-SCTP: Chunk Checksum of SCTP for Enhancement of Throughput in Wireless Network Environments CC-SCTP: Chunk Checksum of SCTP for Enhancement of Throughput in Wireless Network Environments Stream Control Transmission Protocol (SCTP) uses the 32-bit checksum in the common header, by which a corrupted

More information

Lecture 11: Networks & Networking

Lecture 11: Networks & Networking Lecture 11: Networks & Networking Contents Distributed systems Network types Network standards ISO and TCP/IP network models Internet architecture IP addressing IP datagrams AE4B33OSS Lecture 11 / Page

More information

Distributed System Chapter 16 Issues in ch 17, ch 18

Distributed System Chapter 16 Issues in ch 17, ch 18 Distributed System Chapter 16 Issues in ch 17, ch 18 1 Chapter 16: Distributed System Structures! Motivation! Types of Network-Based Operating Systems! Network Structure! Network Topology! Communication

More information

Redundancy for Routers using Enhanced VRRP

Redundancy for Routers using Enhanced VRRP Redundancy for Routers using Enhanced VRRP 1 G.K.Venkatesh, 2 P.V. Rao 1 Asst. Prof, Electronics Engg, Jain University Banglaore, India 2 Prof., Department of Electronics Engg., Rajarajeshwari College

More information

A FORWARDING CACHE VLAN PROTOCOL (FCVP) IN WIRELESS NETWORKS

A FORWARDING CACHE VLAN PROTOCOL (FCVP) IN WIRELESS NETWORKS A FORWARDING CACHE VLAN PROTOCOL (FCVP) IN WIRELESS NETWORKS Tzu-Chiang Chiang,, Ching-Hung Yeh, Yueh-Min Huang and Fenglien Lee Department of Engineering Science, National Cheng-Kung University, Taiwan,

More information

Firewalls and NAT. Firewalls. firewall isolates organization s internal net from larger Internet, allowing some packets to pass, blocking others.

Firewalls and NAT. Firewalls. firewall isolates organization s internal net from larger Internet, allowing some packets to pass, blocking others. Firews and NAT 1 Firews By conventional definition, a firew is a partition made of fireproof material designed to prevent the spread of fire from one part of a building to another. firew isolates organization

More information

Lecture (11) OSI layer 4 protocols TCP/UDP protocols

Lecture (11) OSI layer 4 protocols TCP/UDP protocols Lecture (11) OSI layer 4 protocols TCP/UDP protocols Dr. Ahmed M. ElShafee ١ Agenda Introduction Typical Features of OSI Layer 4 Connectionless and Connection Oriented Protocols OSI Layer 4 Common feature:

More information

TriScale Clustering Tech Note

TriScale Clustering Tech Note TriScale Clustering Tech Note www.citrix.com Table of Contents Expanding Capacity with TriScale Clustering... 2 How Clustering Works... 2 Cluster Communication... 3 Cluster Configuration and Synchronization...

More information

Stream Control Transmission Protocol (SCTP)

Stream Control Transmission Protocol (SCTP) Stream Control Transmission Protocol (SCTP) Definition Stream control transmission protocol (SCTP) is an end-to-end, connectionoriented protocol that transports data in independent sequenced streams. SCTP

More information

Introduction to Networks and the Internet

Introduction to Networks and the Internet Introduction to Networks and the Internet CMPE 80N Announcements Project 2. Reference page. Library presentation. Internet History video. Spring 2003 Week 7 1 2 Today Internetworking (cont d). Fragmentation.

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

Outline 9.2. TCP for 2.5G/3G wireless

Outline 9.2. TCP for 2.5G/3G wireless Transport layer 9.1 Outline Motivation, TCP-mechanisms Classical approaches (Indirect TCP, Snooping TCP, Mobile TCP) PEPs in general Additional optimizations (Fast retransmit/recovery, Transmission freezing,

More information

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs.

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs. Internetworking Multiple networks are a fact of life: Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs. Fault isolation,

More information

End-to-End Architectures for the Internet Host Mobility: An Overview

End-to-End Architectures for the Internet Host Mobility: An Overview Page 1 of 7 End-to-End Architectures for the Internet Host Mobility: An Overview Bilal Farooq Lahore University of Management Sciences Department of Computer Science bilalf@lums.edu.pk April 7 th, 2003

More information

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) OUTLINE Flat datacenter storage Deterministic data placement in fds Metadata properties of fds Per-blob metadata in fds Dynamic Work Allocation in fds Replication

More information

Networking TCP/IP routing and workload balancing

Networking TCP/IP routing and workload balancing System i Networking TCP/IP routing and workload balancing Version 6 Release 1 System i Networking TCP/IP routing and workload balancing Version 6 Release 1 Note Before using this information and the product

More information

Mobile Communications Chapter 8: Network Protocols/Mobile IP

Mobile Communications Chapter 8: Network Protocols/Mobile IP Mobile Communications Chapter 8: Network Protocols/Mobile IP Motivation Data transfer, Encapsulation Security, IPv6, Problems Micro mobility support DHCP Ad-hoc networks, Routing protocols Prof. Jó Ueyama

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

Module 15: Network Structures

Module 15: Network Structures Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and

More information

Fault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel

Fault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel Fault Tolerant Java Virtual Machine Roy Friedman and Alon Kama Technion Haifa, Israel Objective Create framework for transparent fault-tolerance Support legacy applications Intended for long-lived, highly

More information

vsphere Networking Update 1 ESXi 5.1 vcenter Server 5.1 vsphere 5.1 EN

vsphere Networking Update 1 ESXi 5.1 vcenter Server 5.1 vsphere 5.1 EN Update 1 ESXi 5.1 vcenter Server 5.1 vsphere 5.1 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check

More information

Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station

Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station Abstract With the growing use of cluster systems in file

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

CSE 4215/5431: Mobile Communications Winter Suprakash Datta

CSE 4215/5431: Mobile Communications Winter Suprakash Datta CSE 4215/5431: Mobile Communications Winter 2013 Suprakash Datta datta@cse.yorku.ca Office: CSEB 3043 Phone: 416-736-2100 ext 77875 Course page: http://www.cse.yorku.ca/course/4215 Some slides are adapted

More information

Analysis of FTP over SCTP and TCP in Congested Network

Analysis of FTP over SCTP and TCP in Congested Network Analysis of FTP over SCTP and TCP in Congested Network Lin-Huang Chang Ming-Yi Liao De-Yu Wang Grad. Inst. of Networking and Communication Eng., Chaoyang University of Dept. of Computer Science and Information

More information

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers 1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.

More information

Table of Contents. Cisco How NAT Works

Table of Contents. Cisco How NAT Works Table of Contents How NAT Works...1 This document contains Flash animation...1 Introduction...1 Behind the Mask...2 Dynamic NAT and Overloading Examples...5 Security and Administration...7 Multi Homing...9

More information

CSc Outline. Basics. What is DHCP? Why DHCP? How does DHCP work? DHCP

CSc Outline. Basics. What is DHCP? Why DHCP? How does DHCP work? DHCP CSc72010 DHCP Outline Basics Comer: Chapter 22 (Chapter 23 in the the 4 th edition) Peterson: Section 4.1.6 RFC 2131 What is DHCP? Dynamic Host Configuration Protocol: provides for configuring hosts that

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Mobile Communications Chapter 9: Mobile Transport Layer

Mobile Communications Chapter 9: Mobile Transport Layer Prof. Dr.-Ing Jochen H. Schiller Inst. of Computer Science Freie Universität Berlin Germany Mobile Communications Chapter 9: Mobile Transport Layer Motivation, TCP-mechanisms Classical approaches (Indirect

More information

Mobile IP. Mobile IP 1

Mobile IP. Mobile IP 1 Mobile IP Mobile IP 1 Motivation for Mobile IP Routing based on IP destination address, network prefix (e.g. 129.13.42) determines physical subnet change of physical subnet implies change of IP address

More information

IBM i Version 7.2. Networking TCP/IP routing and workload balancing IBM

IBM i Version 7.2. Networking TCP/IP routing and workload balancing IBM IBM i Version 7.2 Networking TCP/IP routing and workload balancing IBM IBM i Version 7.2 Networking TCP/IP routing and workload balancing IBM Note Before using this information and the product it supports,

More information

ECE 650 Systems Programming & Engineering. Spring 2018

ECE 650 Systems Programming & Engineering. Spring 2018 ECE 650 Systems Programming & Engineering Spring 2018 Networking Transport Layer Tyler Bletsch Duke University Slides are adapted from Brian Rogers (Duke) TCP/IP Model 2 Transport Layer Problem solved:

More information

April 21, 2017 Revision GridDB Reliability and Robustness

April 21, 2017 Revision GridDB Reliability and Robustness April 21, 2017 Revision 1.0.6 GridDB Reliability and Robustness Table of Contents Executive Summary... 2 Introduction... 2 Reliability Features... 2 Hybrid Cluster Management Architecture... 3 Partition

More information

Features of a proxy server: - Nowadays, by using TCP/IP within local area networks, the relaying role that the proxy

Features of a proxy server: - Nowadays, by using TCP/IP within local area networks, the relaying role that the proxy Que: -Proxy server Introduction: Proxy simply means acting on someone other s behalf. A Proxy acts on behalf of the client or user to provide access to a network service, and it shields each side from

More information

CoRAL: A Transparent Fault-Tolerant Web Service

CoRAL: A Transparent Fault-Tolerant Web Service Journal of Systems and Software, Vol. 82(1) pp. 131-143 (January 2009). http://dx.doi.org/10.1016/j.jss.2008.06.036 CoRAL: A Transparent Fault-Tolerant Web Service Navid Aghdaie 1 and Yuval Tamir * Concurrent

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is

More information

CCNA 1 Chapter 7 v5.0 Exam Answers 2013

CCNA 1 Chapter 7 v5.0 Exam Answers 2013 CCNA 1 Chapter 7 v5.0 Exam Answers 2013 1 A PC is downloading a large file from a server. The TCP window is 1000 bytes. The server is sending the file using 100-byte segments. How many segments will the

More information

Contributions to Session Aware Frameworks for Next Generation Internet Services

Contributions to Session Aware Frameworks for Next Generation Internet Services Contributions to Session Aware Frameworks for Next Generation Internet Services PhD Dissertation Defense by Narjess AYARI Phd. Prepared at Orange Labs (SIRP/ASF/INTL) and RESO, LIP Université de Lyon,

More information

GRE and DM VPNs. Understanding the GRE Modes Page CHAPTER

GRE and DM VPNs. Understanding the GRE Modes Page CHAPTER CHAPTER 23 You can configure Generic Routing Encapsulation (GRE) and Dynamic Multipoint (DM) VPNs that include GRE mode configurations. You can configure IPsec GRE VPNs for hub-and-spoke, point-to-point,

More information

Network-Adaptive Video Coding and Transmission

Network-Adaptive Video Coding and Transmission Header for SPIE use Network-Adaptive Video Coding and Transmission Kay Sripanidkulchai and Tsuhan Chen Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213

More information

ET4254 Communications and Networking 1

ET4254 Communications and Networking 1 Topic 9 Internet Protocols Aims:- basic protocol functions internetworking principles connectionless internetworking IP IPv6 IPSec 1 Protocol Functions have a small set of functions that form basis of

More information

9th Slide Set Computer Networks

9th Slide Set Computer Networks Prof. Dr. Christian Baun 9th Slide Set Computer Networks Frankfurt University of Applied Sciences WS1718 1/49 9th Slide Set Computer Networks Prof. Dr. Christian Baun Frankfurt University of Applied Sciences

More information

20: Networking (2) TCP Socket Buffers. Mark Handley. TCP Acks. TCP Data. Application. Application. Kernel. Kernel. Socket buffer.

20: Networking (2) TCP Socket Buffers. Mark Handley. TCP Acks. TCP Data. Application. Application. Kernel. Kernel. Socket buffer. 20: Networking (2) Mark Handley TCP Socket Buffers Application Application Kernel write Kernel read Socket buffer Socket buffer DMA DMA NIC TCP Acks NIC TCP Data 1 TCP Socket Buffers Send-side Socket Buffer

More information

Operating Systems Design Exam 3 Review: Spring Paul Krzyzanowski

Operating Systems Design Exam 3 Review: Spring Paul Krzyzanowski Operating Systems Design Exam 3 Review: Spring 2012 Paul Krzyzanowski pxk@cs.rutgers.edu 1 Question 1 An Ethernet device driver implements the: (a) Data Link layer. (b) Network layer. (c) Transport layer.

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 2016 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 Question 1 Why does it not make sense to use TCP (Transmission Control Protocol) for the Network Time Protocol (NTP)?

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

IBM InfoSphere Streams v4.0 Performance Best Practices

IBM InfoSphere Streams v4.0 Performance Best Practices Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related

More information

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS 28 CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS Introduction Measurement-based scheme, that constantly monitors the network, will incorporate the current network state in the

More information

Internet Layers. Physical Layer. Application. Application. Transport. Transport. Network. Network. Network. Network. Link. Link. Link.

Internet Layers. Physical Layer. Application. Application. Transport. Transport. Network. Network. Network. Network. Link. Link. Link. Internet Layers Application Application Transport Transport Network Network Network Network Link Link Link Link Ethernet Fiber Optics Physical Layer Wi-Fi ARP requests and responses IP: 192.168.1.1 MAC:

More information

Oracle E-Business Availability Options. Solution Series for Oracle: 2 of 5

Oracle E-Business Availability Options. Solution Series for Oracle: 2 of 5 Oracle E-Business Availability Options Solution Series for Oracle: 2 of 5 Table of Contents Coping with E-Business Hours Oracle E-Business Availability Options.....1 Understanding Challenges to Availability...........................2

More information

Network performance. slide 1 gaius. Network performance

Network performance. slide 1 gaius. Network performance slide 1 historically much network performance research was based on the assumption that network traffic was random apoisson distribution of traffic Paxson and Floyd 1994, Willinger 1995 found this assumption

More information

THE INTERNET PROTOCOL/1

THE INTERNET PROTOCOL/1 THE INTERNET PROTOCOL a (connectionless) network layer protocol designed for use in interconnected systems of packet-switched computer communication networks (store-and-forward paradigm) provides for transmitting

More information

Staged Refresh Timers for RSVP

Staged Refresh Timers for RSVP Staged Refresh Timers for RSVP Ping Pan and Henning Schulzrinne Abstract The current resource Reservation Protocol (RSVP) design has no reliability mechanism for the delivery of control messages. Instead,

More information

Client-Transparent Fault-Tolerant Web Service

Client-Transparent Fault-Tolerant Web Service Proceedings of the 20th IEEE International Performance, Computing, and Communications Conference, Phoenix, AZ, April 2001. Client-Transparent Fault-Tolerant Web Service Navid Aghdaie and Yuval Tamir UCLA

More information

Routing Overview. Information About Routing CHAPTER

Routing Overview. Information About Routing CHAPTER 21 CHAPTER This chapter describes underlying concepts of how routing behaves within the ASA, and the routing protocols that are supported. This chapter includes the following sections: Information About

More information

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste Outline 15-441 Computer Networking Lecture 18 TCP Performance Peter Steenkiste Fall 2010 www.cs.cmu.edu/~prs/15-441-f10 TCP congestion avoidance TCP slow start TCP modeling TCP details 2 AIMD Distributed,

More information

Remote Procedure Call. Tom Anderson

Remote Procedure Call. Tom Anderson Remote Procedure Call Tom Anderson Why Are Distributed Systems Hard? Asynchrony Different nodes run at different speeds Messages can be unpredictably, arbitrarily delayed Failures (partial and ambiguous)

More information

NAT Router Performance Evaluation

NAT Router Performance Evaluation University of Aizu, Graduation Thesis. Mar, 22 17173 1 NAT Performance Evaluation HAYASHI yu-ichi 17173 Supervised by Atsushi Kara Abstract This thesis describes a quantitative analysis of NAT routers

More information

Communication Networks ( ) / Fall 2013 The Blavatnik School of Computer Science, Tel-Aviv University. Allon Wagner

Communication Networks ( ) / Fall 2013 The Blavatnik School of Computer Science, Tel-Aviv University. Allon Wagner Communication Networks (0368-3030) / Fall 2013 The Blavatnik School of Computer Science, Tel-Aviv University Allon Wagner Kurose & Ross, Chapter 4 (5 th ed.) Many slides adapted from: J. Kurose & K. Ross

More information

CSE 4215/5431: Mobile Communications Winter Suprakash Datta

CSE 4215/5431: Mobile Communications Winter Suprakash Datta CSE 4215/5431: Mobile Communications Winter 2013 Suprakash Datta datta@cse.yorku.ca Office: CSEB 3043 Phone: 416-736-2100 ext 77875 Course page: http://www.cse.yorku.ca/course/4215 Some slides are adapted

More information

A Multihoming based IPv4/IPv6 Transition Approach

A Multihoming based IPv4/IPv6 Transition Approach A Multihoming based IPv4/IPv6 Transition Approach Lizhong Xie, Jun Bi, and Jianping Wu Network Research Center, Tsinghua University, China Education and Research Network (CERNET) Beijing 100084, China

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

An Improved Weighted Least Connection Scheduling Algorithm for Load Balancing in Web Cluster Systems

An Improved Weighted Least Connection Scheduling Algorithm for Load Balancing in Web Cluster Systems An Improved Weighted Least Connection Scheduling Algorithm for Load Balancing in Web Cluster Systems Gurasis Singh 1, Kamalpreet Kaur 2 1Assistant professor,department of Computer Science, Guru Nanak Dev

More information

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition, Chapter 17: Distributed-File Systems, Silberschatz, Galvin and Gagne 2009 Chapter 17 Distributed-File Systems Background Naming and Transparency Remote File Access Stateful versus Stateless Service File

More information

CSE/EE 461 Lecture 13 Connections and Fragmentation. TCP Connection Management

CSE/EE 461 Lecture 13 Connections and Fragmentation. TCP Connection Management CSE/EE 461 Lecture 13 Connections and Fragmentation Tom Anderson tom@cs.washington.edu Peterson, Chapter 5.2 TCP Connection Management Setup assymetric 3-way handshake Transfer sliding window; data and

More information

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University

More information

The Internet Protocol

The Internet Protocol The Internet Protocol Stefan D. Bruda Winter 2018 THE INTERNET PROTOCOL A (connectionless) network layer protocol Designed for use in interconnected systems of packet-switched computer communication networks

More information