Loaded: Server Load Balancing for IPv6

Loaded: Server Load Balancing for IPv6 Sven Friedrich, Sebastian Krahmer, Lars Schneidenbach, Bettina Schnor Institute of Computer Science University Potsdam Potsdam, Germany fsfried, krahmer, lschneid, schnorg@cs.uni-potsdam.de Abstract With the next generation Internet protocol IPv6 at the horizon, it is time to think about how applications can migrate to IPv6. Web traffic is currently one of the most important applications in the Internet. The increasing popularity of dynamically generated content on the World Wide Web, has created the need for fast web servers. Server clustering together with server load balancing has emerged as a promising technique to build scalable web servers. The paper gives a short overview over the new features of IPv6 and different server load balancing technologies. Further, we present and evaluate Loaded, an user-space server load balancer for IPv4 and IPv6 based on Linux. 1. Introduction Clusters are a widespread platform for high-performance and high-availability computing. In the most cases, socalled Beowulf Clusters [3] are used which consist of offthe-shelf Personal Computer (PC) technology running the Linux operating system. Linux is supporting IPv6 since version 2.5 (also back-ported to 2.4 and 2.2 Kernels). For high-performance clusters, there have been several message passing libraries developed which support parallel applications. The most prominent is the Message Passing Interface (MPI) which is a standard specification for message-passing libraries [14]. In [15] it was shown how MPICH, an MPI implementation from the Argonne National Lab and the Mississippi State University [10, 13] was ported to IPv6. Due to the good design of MPICH this work was more or the less straightforward. In this paper we present an effort to use IPv6 within highavailability clusters. In order to reliably serve dynamically generated content to an increasing number of clients, web servers have to be highly available and scalable. Server clustering in conjunction with server load balancing is a promising technique to build high availability clusters. IPv6 will be required by future web servers. And therefore, server load balancers need to support IPv6, too. The paper is organized as follows. Section 2 gives an introduction to common server load balancing technologies. Section 3 and 4 outline important features of IPv6 and the netfilter interface of Linux. Further sections present details of Loaded our server load balancing software and present evaluation results. The paper concludes with discussion and future work. 2. Server Load Balancing Technologies Server load balancing is a mechanism to make a service scalable, fast, and more reliable. Clients contact a service via a Virtual IP (VIP) address. This VIP belongs to a front end server that forwards the requests to the back-end servers by using load balancing strategies. There are often dozens or even hundreds of servers operating behind a single VIP. Figure 1 shows a server load balancer (SLB), sometimes in the literature also called dispatcher, with two network connections, a so-called two-armed load balancer [5]. The SLB performs network address translation (NAT) from one network to another. This enables for example the back-end servers to have only private IPs. The tasks of a server load balancer are: ffl load balancing: distributing incoming packets onto the back-end servers, ffl health checking: monitoring the back-end servers and taking a crashed server out of the distribution. While the back-end provides reliability through replicated services, the SLB is still a single point of failure. Therefore, the SLB is also often replicated itself and an active-standby or active-active scenario is used [5]. There exist several different load balancing systems, commercial and non-commercial [16]. A prominent example is the Linux Virtual Server (LVS) [1, 17]. LVS extends the TCP/IP stack of the Linux kernel to support three IP load

Server 1 Internet VIP Server 2 1.2.3.4 Dispatcher Server 42 10.3.2.1 10.3.2.2 10.3.2.42 Figure 1. Server Load Balancing Example. balancing techniques, LVS/NAT, LVS/TUN, and LVS/DR. LVS supports four scheduling algorithms: 1. Round-Robin of connection requests. 2. Weighted Round-Robin performs like Round-Robin, but can treat servers with different processing capacity. 3. Least-Connection scheduling directs network connections to the server with the least number of active connections. 4. Weighted Least-Connection Scheduling performs Least-Connection scheduling, but can treat servers with different processing capacity. Bryhni and al. have analyzed different load balancing algorithms in a homogeneous system where all servers have the same processing capacity [6]. They compare four scheduling algorithms, Round-Robin, Least-Connection, and two more complex strategies. Their results show that, while Round-Robin is a very simple strategy, it has the best performance. 3. IPv6 Features In this section we give a short overview over the new IPv6 features. More information can be found in the literature [8, 11, 7]. The main motivation of the new specification of the Internet protocol is the address space limitations. The global Internet has experienced many years of sustained exponential growth, doubling in size every nine months or faster. In 1999, on the average, a new host appeared on the Internet every two seconds [7]. However, the 32-bit address space of IPv4 is limited and, even more, unfair divided between different organizations. While network address translation technology seems to be a solution for some topologies like managing multiple LANs of one corporation, it cannot handle the needs of new evolving Internet areas like for example China. The most important changes specified with IPv6 are: 1. Larger Addresses. The 128 Bits Addresses are the most noticeable change. The address space is greater than 3:4x10 38 which results in over 10 24 addresses per square meter of the earth s surface [7]. 2. New Datagram Format. Opposite to IPv4 the new protocol has a so-called Base Header of fixed length, namely a 40 Bytes Base Header. This Base Header may follow so-called Extension Headers for special purposes. Whether an Extension Header follows or not is specified in the NEXT-HEADER field of the Base Header. Possible, already specified extension headers are for example headers with routing, fragmentation, authentication, and encryption information. 3. Extension Capability. The concept of base and extension headers is a powerful instrument to adapt the protocol to future needs. In case, IPv6 needs to support a new feature, the only task is to specify a new extension header for this. 4. Security. The protocols of IPsec [7] for secure Internet communication are mandatory. 5. End-to-End Fragmentation. IPv6 uses end-to-end fragmentation to avoid assembling and re-assembling overhead in routers. The sending host can either use the guaranteed minimum MTU of 1280 bytes or perform a Path MTU Discovery to determine the minimum MTU along the path to the destination host. The new addressing scheme is one of the main new features of IPv6. Along with this comes a new class of addresses, the so-called Anycast-addresswhich is defined similar to a Multicast-address, but only one machine out of a group must receive the message. This address class is obviously useful for server load balancing. But our investigation showed that currently this address is neither used for load balancing nor does a server load balancer exist for IPv6 at all.

4. The Linux Netfilter Interface Since the kernel 2.4 Linux implements the easy-tohandle netfilter interface for packet filtering [4]. Packets may be accepted, discarded, forwarded or rewritten in case of NAT. There are three chains per default: input, forwarding and output chain. Netfilter knows different targets which tell what to do with a packet which matches a rule, for example DROP, ACCEPT or QUEUE. In case of the target QUEUE, the packet is queued for further user-space processing. Hence, the packet may be modified by a user program and then forwarded back to the kernel. For using the QUEUE target, the libipq library is needed. 5. Loaded Since LVS is implemented within the communication part of the kernel, a port of LVS to IPv6 is not straightforward. Instead, Loaded uses the netfilter interface to implement server load balancing in the user-space. The benefit of a user-space solution is that it is easy to port to different protocols. Up to now we support IPv4 and IPv6, currently we are working on InfiniBand [9, 12]. The drawback compared with a kernel solution like LVS is the additional overhead for context switches between kernel and user mode. 5.1. Features The main features of Loaded are: ffl network address translation, ffl server load balancing, and ffl health checking. Loaded is an easy-to-handle server load balancer software that is developed for two-armed environments (see Figure 1). Running in a two-armed setup, loaded can handle a private back-end hidden from the public network for security issues. In this environment, Loaded on the frontend server acts as router/gateway software for the back-end and uses NAT to route traffic. Two scheduling algorithms are supported at the moment: Round Robin and Weighted Round Robin (see Section 2). These algorithms can be easily extended or modified. A neighbor discovery mechanism is implemented to ensure the availability of the services on back-end machines. 5.2. Architecture Loaded is implemented in C++ and runs as a user-space service [2]. It consists of three components - the packet packet handling libipq kernel / netfilter scheduler neighbor discovery Figure 2. Architecture of Loaded. packet to handle interaction handling, a scheduler, and a neighbor discovery - shown in Figure 2. Incoming traffic that hits the virtual IP address of the load balancer is delivered to Loaded by the netfilter-kernel component via the QUEUE-target. The packet handling component is responsible for changing IP packet header information (source, destination), recalculating the TCP checksum, and in case of IPv4 the IP checksum. An appropriate back-end server is chosen by the scheduler using the configured algorithm. The chosen IP address is taken by the packet handling to perform the network address translation. Then the packet is given back to the kernel and forwarded to the back-end. For outgoing traffic the packet handling replaces the source address of the back-end server with the virtual IP of the load balancer. The load balancer has to distinguish between incoming and outgoing traffic. This decision is made by looking at the destination IP address. If it is the virtual IP of the load balancer then it belongs to incoming traffic. Multiple back-end servers provide a robust environment. If a server fails, others can take over its load. Loaded has a build in neighbor discovery to remove failed servers from the schedulers list and put new installed servers into the list of known back-end servers. Running in an extra thread the neighbor discovery periodically broadcasts a ping into the back-end network. For all replying machines the availability of the balanced service(s) is checked. The neighbor discovery mechanism also provides scalability by automatic detection of new added back-end servers. Several services require a persistent connection. In this case Round Robin scheduled requests would break the connection. When the first packet from an unknown client is received then a back-end server is chosen and the tuple (client IP, back-end server IP) is stored. All following IP packets from the same source (client IP address) will be forwarded to this server. Coupling client IP addresses to a back-end server has a drawback in conjunction with the use of network address

12 10 LVS loaded IP version 4 comparison: LVS - loaded bandwidth [MB/s] 8 6 4 2 0 100 1000 10000 100000 datasize [Byte] Figure 3. Comparison for IPv4. translation in the Internet, since a NAT-router turns a lot of client IP addresses into one. This results in a kind of dedicated back-end server for a large number of clients. The reason for this is that a persistent TCP connection is invisible at IP-level and therefore the load balancer can not decide to delete the saved tuple. 5.3. Challenges Due to the use of iptables and the netfilter interface, it was similarly complex to implement Loaded with IPv6 and IPv4 support. The main work to do was the handling of IPv6 headers and addresses. A minor difference is that there is no need to calculate a header checksum for IPv6 packets. 6. Performance Since a real world scenario with thousands of clients is hard to setup, SLBs are often evaluated in simulation studies (see for example [6]). The focus of our performance tests was to investigate the overhead added by the user-space approach of Loaded. This can only be done in real measurements. The testbed consists of one front-end server and two back-end servers, all Pentium III 800 Mhz machines, connected via a Fast Ethernet switch. One machine was used to initiate client requests. The evaluation of Loaded is done by measuring the achievable bandwidth. Therefore, the front end was first configured only as a router, and then as a SLB with Loaded. In case of IPv4 we also compare it against LVS. Measuring the bandwidth is done with a simple TCP pingpong benchmark that sends and receives messages of a given size. First, we compare the achievable bandwidth of Loaded and LVS in a IPv4 network (see Figure 3). The bandwidth of LVS was the same as the routing bandwidth, therefore the routing measurement is not presented. Figure 3 shows that Loaded performs nearly as good as LVS. In case of Loaded, for each packet a mode switch from kernel to user mode and back is necessary. This results for example in a bandwidth reduction of 24 % for small packets of 127 Bytes, and 5.5 % for 4 KBytes packets. For small messages the latency is increased by about 30 μs. Since a user application has no access to kernel memory, the packet has to be copied, leading to a slightly increasing latency for larger packets (up to 60 μs for 64 KByte packets). The IPv6 and IPv4 results are similar due to the same reasons. Figure 4 shows the performance differences between the front-end machine running as a router and as a server load balancer in an IPv6 network. The figures imply that neither LVS nor Loaded are the performance bottleneck. The only bandwidth limitation was the network capacity. 7. Conclusion and Future Work We have presented Loaded, an user-space server load balancer for IPv4 and IPv6 networks. The performance figures show that the performance loss due to additional user/kernel mode switches is acceptable. On the other hand, the decision to implement an user-space solution leads to higher portability and flexibility.

12 10 routing only loaded balancing IP version 6 comparision: routing - loaded bandwidth [MB/s] 8 6 4 2 0 100 1000 10000 100000 datasize [Byte] Figure 4. Comparison for IPv6. Measurements have to be repeated using high-speed interconnects like GigabitEthernet to evaluate the overhead under higher load. Currently, we are evaluating two different approaches for server load balancing in InfiniBand networks. The first approach uses Loaded and IP-over-InfiniBand. The other uses Loaded to inject IP-Packets into native InfiniBand networks. References [1] Linux virtual server homepage. http://linuxvirtualserver.org/. [2] Loaded information and source code. http://www.cs.unipotsdam.de/bs/research/cluster/loaded. [3] D.J.Becker,T.Sterling,D.Savarese,J.E.Dorband,U.A. Ranawake, and C. V. Packer. Beowulf: A parallel workstation for scientific computation. In Proceedings of the International Conference on Parallel Processing (ICPP), pages 11 14, 1995. [4] C. Bienvenuti. O Reilly, 2005. [5] T. Bourke. Server Load Balancing. O Reilly, 2001. [6] H. Bryhni, E. Klovning, and O. Kure. A comparison of load balancing techniques for scalable web servers. IEEE Network, 14(4):58 64, 2000. [7] D. Comer. Internetworking with TCP/IP - Principles, Protocols, and Architectures. Prentice Hall, fourth edition, 2000. [8] S. Deering and R. Hinden. Internet protocol, version 6 (ipv6) specification. IETF RFC 2460, 1999. [9] S. Friedrich, L. Schneidenbach, and B. Schnor. SLIBNet: Server Load Balancing for InfiniBand Networks. Technical Report TR-2005-12, University Potsdam, 2005. [10] W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High- Performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing, 22(6):789 828, 1996. [11] R. Hinden and S. Deering. Internet protocol version 6 (ipv6) addressing architecture. IETF RFC 3513, Apr. 2003. [12] InfiniBand Trade Association. Infiniband architecture specification volumes 1 and 2 release 1.1. http://www.infinibandta.org/specs, 2002. [13] MPICH Home Page. http://www.mcs.anl.gov/ mpi/mpich/. [14] MPI: A Message Passing Interface Standard, June 1995. Message Passing Interface Forum. [15] L. Schneidenbach and B. Schnor. Migration of MPI Applications to IPv6 Networks. In Proceedings of the PDCN 2005, pages 172 176, Innsbruck, A, Feb. 2005. [16] T. Schroeder, S. Goddard, and B. Tamamurthy. Scalable web server clustering technologies. IEEE Network, 14(3):38 45, 2000. [17] W. Zhang. Linux virtual server for scalable network services. In Ottawa Linux Symposium, 2000.