Design, Implementation and Performance of Resource Management Scheme for TCP Connections at Web Proxy Servers

Design, Implementation and Performance of Resource Management Scheme for TCP Connections at Web Proxy Servers Takuya Okamoto Tatsuhiko Terai Go Hasegawa Masayuki Murata Graduate School of Engineering Science, Osaka University -3 Machikaneyama, Toyonaka, Osaka 560-853, Japan Phone: +8-6-6850-666, Fax: +8-6-6850-6589 E-mail: {tak-okmt, terai}@ics.es.osaka-u.ac.jp Cybermedia Center, Osaka University -30 Machikaneyama, Toyonaka, Osaka 560-0043, Japan Phone: +8-6-6850-666, Fax: +8-6-6850-6589 E-mail: {hasegawa, murata}@cmc.osaka-u.ac.jp Abstract A great deal of research has been devoted to tackling the problem of network congestion against the increase of Internet traffic. However, there has been little concern regarding improvements of the performance of Internet servers in spite of the projections that the bottleneck is now being shifted from the network to endhosts. We have previously proposed a scheme called SSBT (Scalable Socket Buffer Tuning) that is intended to be used to improve the performance of Web servers by maintaining their resources for TCP connections effectively and fairly. On the current Internet, however, a significant amount of Web document transfer requests are sent through Web proxy servers. Accordingly, in this paper, we propose a new resource/connection management scheme for Web proxy servers to improve their performance and reduce the elapsed Web document transfer time when using proxy servers. We validate the effectiveness of our proposed scheme through simulation experiments, and confirm conclusively that it can be used to manage resources effectively. Additionally, we implement the proposed scheme on an actual Web proxy server, and examine its performance by using benchmark tests that consider the Web access model, in terms of the proxy server throughput and document transfer delay. Introduction The rapid increase of users on the Internet has been the impetus for many research efforts toward dissolving network congestion against an increase of network traffic. However, there has been little work in the area of improvements to the performance of Internet hosts in spite of the projected shift of the performance bottleneck from the network to the endhosts. There are already hints of this scenario s emergence as evidenced by the proliferation of busy Web servers on the present-day Internet that currently receive hundreds of document transfer requests every second during peak volume periods. In [], we have proposed SSBT (Scalable Socket Buffer Tuning) which is intended to improve the performance of Web servers by maintaining their resources effectively and fairly. SSBT is comprised of two major components; E-ATBT (Equation-based Automatic TCP Buffer Tuning) and SMR (Simple Memory-copy Reduction) schemes. In E-ATBT, we maintain an expected throughput value for each active TCP connection that is determined by an analytic estimation result of the TCP throughput [2]. The expected value is characterized by packet loss rate, RTT (Round Trip Time), and RTO (Retransmission Time Out), which are easily monitored by the sender hosts. The result of this tuning process is that a send socket buffer is assigned to each connection based on its expected throughput with consideration of the maxmin fairness among the connections. The SMR scheme provides a set of socket system calls in order to reduce the number of memory copy operations at the sender host in TCP data transfer tasks. SMR is alike as other schemes [3, 4], however, it is simpler to implement. We have validated the effectiveness of our proposed mechanisms through the simulation and implementation experiments. The results we obtained for SSBT indicated that it is able to reduce Web document transfer times by /5 when compared with those obtained for conventional Web servers. In the current Internet many requests for Web document transfer are made via Web proxy servers [5]. Since the proxy servers are usually prepared by ISPs (Internet Service Providers) for their customers, such proxy servers must accommodate a large number of HTTP accesses simultaneously. Furthermore, the proxy servers must handle both upward TCP connections (from the proxy server to Web servers) and downward TCP connections (from the client hosts to the proxy server). Hence, the proxy server becomes a likely spot for bottlenecks to occur during Web document transfers, even when the bandwidth of the network and the Web server performance are large enough. It is the contention that any effort expended to study ways to reduce the transfer time of Web documents must consider improvements the proxy server performance. The bulk of the past reported research on improvements of the Web proxy server performance was focused on cache replacement algorithms [6, 7]. In [8], the authors have evaluated the

performance of Web proxy servers focusing on the difference between HTTP/.0 and HTTP/. through simulation experiments, including the effect of using cookies and aborting the document transfer by the client hosts. However, there are few works on the resource management at the proxy server, and no effective mechanism has been proposed. In this paper, we first discuss several problems that arise in the handling of TCP connections at the Web proxy server. One of these problems involves the assignment of socket buffer for TCP connections at the proxy server. When a TCP connection is not assigned the proper size send/receive socket buffer based on its expected throughput, the assigned socket buffer may be left unused or insufficient because its size might be inadequate for performing the intended task, which results in a waste of the socket buffer. Another problem is the management of persistent TCP connections, which waste the resources of a busy proxy server. When a proxy server must accommodate many persistent TCP connections without the use of an effective management scheme, its resources continue to be assigned to these connections whether these are active or not. It results in that new TCP connections cannot be established since the server resources become short. We propose a new resource management scheme for proxy servers that has been capable for solving those problems. Our proposed scheme has the following two features: One is an enhanced E-ATBT, which is a revision of our previous E-ATBT proposed for proxy servers. Differently from Web servers, the proxy server must handle upward and downward TCP connections and behave as a TCP receiver host to obtain Web documents from Web servers. We therefore enhanced the E-ATBT in order to effectively handle the dependency of upward and downward TCP connections and to dynamically assign the receive socket buffer. The other is a connection management scheme that prevents newly arriving TCP connection requests from being rejected at the proxy server due to a lack of resources. The scheme involves the management of persistent TCP connections provided by HTTP/., which intentionally tries to close them when the resources at the proxy server are shorthanded. We validate an effectiveness of our proposed scheme through simulation and implementation experiments. In the simulation experiments, we evaluate the essential performance and characteristics of our proposed scheme by comparing them with those of the original proxy server. We further show the results of the implementation experiments, and confirm the effectiveness of connection management scheme in terms of the proxy server throughput and document transfer delay. The rest of this paper is organized as follows. In Section 2, we outline the current Web proxy servers and their resources as they regard TCP connections, and discuss the merits and demerits of persistent TCP connections over HTTP/.. In Section 3, we propose a new resource management scheme for proxy servers, and confirm its effectiveness by describing the results of our simulation experiments in Section 4. Additionally, we present some implementation issues of our proposed schemes on an actual proxy server, and confirm the effectiveness of the scheme by using some benchmark tests in Section 5. Finally, we present our concluding remarks in Section 6. 2 Background In this section, we describe the background of our research on Web proxy servers in Subsection 2.. We discuss the potential that persistent connections have to improve Web document transfer times. However, as will be made apparent, it requires a careful treatment at the proxy server, which will be described in Subsection 2.2. 2. Web Proxy Server A Web proxy server works as an agent for Web client hosts that request Web documents. See Figure. When it receives a Web document transfer request from a client host, it obtains the requested document from the original Web servers on behalf of the client host and delivers it to the client host. It also caches the obtained Web documents. When other client hosts request the same document, it then transfers the cached document, which results in a much reduced document transfer time. For example, it was reported in [8] that using Web proxy servers reduces document transfer times by up to 30%. Furthermore, when the cache is hit, the document transfer is performed without any connection to Web servers. Thus, the congestion that occurs within a network and at Web servers can be reduced. The proxy server accommodates a large number of connections from Web client hosts and to Web servers. It is a different point from Web servers. The proxy server behaves as a sender host for downward TCP connections (between client hosts and the proxy server) and as a receiver host for upward TCP connections (between a proxy server and Web servers). Therefore, if the resource management scheme is not appropriately configured at the proxy server, the document transfer time increases even when the network is not congested or the Web server load is not high. That is, careful and effective resource management is a critical issue to improve the performance of a Web proxy server. On the current Internet, however, most proxy servers including those in [9, 0] lack such considerations. The resources at the Web proxy server that we focus on in this paper are mbuf, file descriptor, control blocks, and socket buffer. These are closely related to the performance of TCP connections when transferring Web documents. Mbuf, file descriptor, and control blocks are resources for TCP connections. The socket buffer is used for storing/receiving the transferred documents by the TCP connections. When those resources are slight, the proxy server is unable to establish a new TCP connection. Thus, the client host has to wait for existing TCP connections to be closed and their assigned resources to be released. If this cannot be accomplished, the proxy server rejects the request. In what follows, we introduce the resources of Web proxy servers as they regard TCP connections. Although in our discussion we consider FreeBSD 4.0 [], we believe that the gist of our discussions can also be applied to other OSs, such as Linux. Mbuf Each TCP connection is assigned an mbuf, which is located in the kernel memory space and used to move the transmission data between the socket buffer and the network interface. When the data size is larger than the mbuf, the data is stored to another 2

Web servers get the document from the original Web server TCP connection cannot be established as the amount of memory spaces becomes short. Internet Upward TCP connection Downward TCP connection Internet Request the document to the original Web server Web proxy server Client hosts No Hit Request a document Figure : Web proxy server Hit deliver the document memory space, called a mbuf cluster, which is listed to the mbuf. Several mbuf clusters are used for storing data according to its size. The number of mbufs prepared by the OS is configured in building the kernel; the default number is 4096 in FreeBSD [2]. Since each TCP connection is assigned at least one mbuf when established, the default number of connections a proxy server can simultaneously establish is 4096. It would be too small for busy proxy servers. File descriptor A file descriptor is assigned to each file in a file system so that the kernel and user applications can identify the file. It is also associated with a TCP connection when it is established. It is called a socket file descriptor. The number of connections that can be established simultaneously is limited to the number of file descriptors prepared by OS. The default number of file descriptors is 064 in FreeBSD [2]. In contrast to the case of mbuf, the number of file descriptors can be changed after the kernel is booted. However, since user applications, such as Squid [9], occupy the memory space according to the number of available file descriptors when they boot, it is very difficult to inform the applications of the change in the number of available file descriptors at the run time. That is, we cannot change the number of file descriptors used by the applications dynamically. Control blocks When establishing a new TCP connection, it is necessary to use more memory space for the data structures that are used in storing the connection information, such as inpcb, tcpcb, and socket. The inpcb structure is used to store the source and destination IP addresses, the port numbers, and so on. The tcpcb structure is for storing network information, such as the RTT (Round Trip Time), RTO (Retransmission Time Out), and congestion window size, which are used for TCP congestion control [3]. The s ocket structure is used for storing the information about the socket. The maximum number of these structures that can be built in the memory space is initially 064. Since the memory space for these data structures are set in building the kernel and it is unchangeable while OS is running, the new Socket buffer The Socket buffer is used for data transfer operations between user applications and the sender/receiver TCP. When the user application transmits data using TCP, the data is copied to the send socket buffer and subsequently it is copied to the mbufs (or mbuf clusters). The size of the assigned socket buffer is a key issue for the effective transfer of data by TCP. Suppose that a server host is sending TCP data to two client hosts; one a 64 Kbps dial-up (say, client A) and the other a 00 Mbps LAN (client B). If the server host assigns equal size send socket buffers to both client hosts, it is likely that the amount of the assigned buffer is too large for client A and too small for client B, because of the differences of capacity (more strictly, bandwidth-delay products) of their connections. For an effective buffer allocation to both client hosts, a compromise of the buffer usage should be taken into account. In [], we proposed an E-ATBT scheme that assigns a send socket buffer to each TCP connection dynamically according to its expected throughput value, which is estimated from the observed network parameters, such as the packet loss probability, RTT, and RTO. That is, a sender host calculates the average throughput of each TCP connection based on the analysis as reported in [2], from the above three parameters. We then calculate the required send socket buffer size as multiplication of the estimated throughput and RTT of the TCP connection. By taking into account the observed network parameters, the resources at the Web server are appropriately allocated to connections in various network environments. E-ATBT is also applicable to Web proxy servers, since the proxy servers accommodate many TCP connections issued by client hosts in various environments. However, since proxy servers have a dependency that exists between upward and downward TCP connections, a simple application of E-ATBT is insufficient. Furthermore, the proxy server behaves as a receiver host for the upward TCP connection to the Web server, we have to consider a management scheme for the receive socket buffer, which was not considered in the original E-ATBT. 2.2 Persistent TCP Connection of HTTP/. In recent years, many Web servers and client hosts (namely, Web browsers) support a persistent connection option, which is one of the most important functions of HTTP/. [4]. In the older version of HTTP (HTTP/.0), the TCP connection between server and client hosts is immediately closed when the document transfer was completed. However, since Web documents have many in-line images, it is necessary to establish TCP connections many times to download them in HTTP/.0. This results in a significant increase of document transfer time since the average size of the Web documents at typical Web servers is about 0 KBytes [5, 6]. The use of the three-way handshake in each TCP connection establishment makes the situation worse. In the HTTP/. persistent TCP connection the server preserves the status of the TCP connection, including the congestion window size, RTT, RTO, ssthresh, and so on, when it finishes the document transfer. Then it re-uses the connection and its sta- 3

hit ratio : h RTT: rttc RTO: rtoc RTT: rtts RTO: rtos loss probability: p c loss probability: ps Client Host HTTP Proxy Server Web Server Figure 2: Analysis model tus when other documents are transferred using the same HTTP session. In this way, the three-way handshake can be avoided. However, the proxy server keeps the TCP connection established, irrespective of whether the connection is active (in use for packet transfer) or not. That is, the resources at the server are wasted when the TCP connection is inactive. Accordingly, a significant portion of the resources may be wasted in order to maintain the persistent TCP connections at the proxy server to accommodate many TCP connections. In what follows, we introduce a simple estimation to show how many TCP connections are active or idle, and to explain that many resources are wasted in the typical Web proxy server when native persistent TCP connections are utilized. For this purpose, we use a network topology shown in Figure 2, where a Web client host and a Web server are connected via a proxy server, and derive the probability that a persistent TCP connection is active, that is, in use sending TCP packets. Notations in the figure, p c, rtt c, and rto c are the packet loss ratio, RTT, and RTO between the client host and the proxy server, respectively. Similarly, p s, rtt s, and rto s are those between the proxy server and the Web server. The packet size is fixed at m. The mean throughput of the TCP connection between the proxy server and the client host, and that between the Web server and the proxy server, denoted as ρ c and ρ s respectively, can be obtained from by using the analysis results presented in our previous work [7]. Note that in [7], a more accurate analysis of TCP throughput estimation than those in [2] and [8] was provided, especially for the case of small file transfers. Using the results presented in [7], we can derive the time for a Web document transfer via a proxy server. For this purpose, we also introduce the parameters h and f, which respectively represent the cache hit ratio of the document at the Web proxy server and the size of the document being transferred. Note that the proxy server is likely to cache Web page, which includes the main document and some in-line images. That is, when the main document is found at the proxy server s cache, the following in-line images are likely to be cached, and vice versa. Thus, h is not an adequate metric when we examine the effect of the persistent connection, but the following observation is also applicable to the above-mentioned case. When the requested document is cached by the proxy server, a request to the original Web server is not required, and the document is directly delivered from the proxy server to the client host. When the proxy server does not have the requested document, on the other hand, it must be transferred from the appropriate Web server to the client host via the proxy server. Thus, the document transfer time, T (f), can be determined as follows; ) ) T (f) = h (S c + fρc +( h) (S c + S s + fρc + fρs where S c and S s represent the connection setup times of a TCP connection between the client host and the proxy server and that between the proxy server and the Web server, respectively. To derive S c and S s, we must consider the effect of the persistent connection provided by HTTP/. which omits the three-way handshake. Here, we define X c as the probability that the TCP connection between the client host and the proxy server is kept connected by the persistent connection, and X s as the corresponding probability to the TCP connection between the proxy server and the Web server. Then S c and S s can be described as follows; S c = X c 2 rtt c +( X c ) 3 2 rtt c () S s = X s 2 rtt s +( X s ) 3 2 rtt s (2) X c and X s are dependent on the length of persistent timer, T p. That is, if the idle time between two successive document transfers is smaller than T p, the TCP connection can be used for the second document transfer. If the idle time is larger than T p,on the other hand, the TCP connection has been closed and a new TCP connection must be established. According to results in [5], where the authors modeled the access pattern of a Web client host, the idle time between Web document transfers follows a Pareto distribution whose probability density function is given as; p(x) = αk α x α+ (3) where α =.5 and k =. Then we can calculate X c and X s as follows; X c = d(t p ) (4) X s = ( h) d(t p ) (5) where d(x) is the cumulative distribution function of p(x). From Eqs.() (5), the average document transfer time, T (f), can be determined. Finally, we can derive U, which is the utilization of the persistent TCP connection as follows; U = Tp 0 p(x) T (f) T (f)+x dx +( d(t p)) T (f) T (f)+t p Figure 3 plots the probability that a TCP connection is active, as a function of the length of persistent timer T p, in cases of various parameter sets of rtt c, p c, rtt s, and p s. Here we set h to 0.5, the packet size to 460 KBytes, and f to 2 KBytes according to the average size of Web documents reported in [5]. We can see from these figures that the utilization of the TCP connection is very low, regardless of the network condition (RTTs and the packet loss ratios on links between the proxy server and the client host/the Web server). Thus, if idle TCP connections are kept established at the proxy server, a large part of the resources at the proxy server are wasted. Furthermore, we can observe that the utilization becomes large when the length of the persistent timer is small (< 5 sec). This is because the smaller value of T p can prevent the emergence of situations where the proxy server s resources are wasted. One solution to this problem is to simply discard HTTP/. and to use HTTP/.0, as the latter closes the TCP connection immediately upon the completion of a document transfer. However, HTTP/. has other elegant mechanisms, such as pipelining and 4

Connection Utilization Connection Utilization 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. rtt s =0.2 sec, p s =0. rtt s =0.2 sec, p s =0.0 rtt s =0.02 sec, p s =0.0 rtt s =0.02 sec, p s =0.00 5 0 5 20 25 30 Persistent Timer [sec] (a) rtt c =0.02 sec, p s = 0.00 rtt s =0.2 sec, p s =0. rtt s =0.2 sec, p s =0.0 rtt s =0.02 sec, p s =0.0 rtt s =0.02 sec, p s =0.00 5 0 5 20 25 30 Persistent Timer [sec] (b) rtt c =0.2 sec, p s = 0.0 Figure 3: Probability that a TCP connection is active content negotiation [4]. We should therefore develop an effective resource management scheme for use under HTTP/.. Our solution is that as the resources become short, the proxy server intentionally closes persistent TCP connections that are unnecessarily wasting the resources at the proxy server. We will describe our scheme in detail in the next section. 3 Algorithm In this section, we propose a new resource management scheme suitable to Web proxy servers, which solves the problems pointed out in the previous section. 3. New Socket Buffer Management Method As described in the previous section, a proxy server behaves as a TCP receiver host when it obtains a requested document from the Web server. We thus have to incorporate a receive socket buffer management algorithm, which was not considered in the original E-ATBT []. Also, we have to consider the dependence of the upward and downward TCP connections. In the following subsections, we propose E 2 -ATBT, which is a revision of our original E-ATBT, having two additional schemes to eliminate these two problems. 3.. Handling the Relation between Upward and Downward TCP Connections A Web proxy server relays a document transfer request to a Web server for a Web client host. Thus, there is a close relation between an upward TCP connection (from the proxy server to the Web server) and a downward TCP connection (from the client host to the proxy server). That is, the difference in expected throughput of both connections should be taken into account when socket buffers are assigned to both connections. For example, when the throughput of a certain downward TCP connection is larger than that of other concurrent downward TCP connections, the larger size of socket buffer should be assigned to the TCP connection by using E-ATBT. However, if the throughput of the upward TCP connection corresponding to the downward TCP connection is low, the send socket buffer assigned to the downward TCP connection is likely not to be fully utilized. In this case, the unused send socket buffer should be assigned to the other concurrent TCP connections having smaller socket buffers, hence, that the throughputs of those TCP connections would be improved. There is one problem that must be overcome to realize the above-mentioned method. TCP connections can be identified with the control block, called tcpcb, by the kernel. However, the relation between the upward and downward connections cannot be determined. Two possible way to overcome this problem can be considered: The proxy server monitors the utilization of the send socket buffer for downward TCP connections. Then, it decreases the assigned buffer size of connections whose send socket buffer is not fully utilized. When the proxy server sends the document transfer request to the Web server, the proxy server attaches information of the relation to the packet header. The former algorithm can be realized only by the modification of the proxy server. On the other hand, the latter algorithm needs the interaction of the HTTP protocol. In the higher abstract model, the above two algorithms have a similar effect. However, the latter has an implementation difficulty despite its ability to achieve precise control. In the simulation experiments described in the next section, we will assume that the proxy server knows the dependency of the downward and upward TCP connections. We use this assumption to confirm the essential performance of the proposed algorithm. 3..2 Control of Receive Socket Buffer In most of past researches, it was assumed that a receiver host has enough large size of receive socket buffer, based on the consideration that the performance bottleneck of the data transfer is not at the endhosts, but within the network. Therefore, many OSs assign a small sized receive socket buffer to each TCP connection. For example, the default size of the receive socket buffer in the FreeBSD system is 6 KBytes. As reported in [9], however, it is now regarded as very small because the network bandwidth 5

is dramatically increased in the current Internet, and the performance of Internet servers has become increasingly higher and higher. To avoid performance limitation introduced by the receive socket buffer, the receiver host should adjust its receive socket buffer size to correspond to the congestion window size of the sender host. This can be done by monitoring the utilization of the receive socket buffer, or by adding information about the window size to the data packet header, similar to that described in the previous subsection. In the simulation experiments of the next section, we suppose that the proxy server can obtain complete information about required sizes of the receive socket buffers of upward TCP connections and control them according to the required size. 3.2 Connection Management Method As explained in Subsection 2.2, the careful treatment of persistent TCP connections on the proxy server is necessary for the efficient usage of the resources at the proxy server. We propose a new management scheme of persistent TCP connections at the proxy server by considering the amount of resource remaining. The key idea is as follows. When the load of the proxy server is low and the remaining resources are sufficient, it tries to keep as many TCP connections open as possible. When the resources at the proxy server are going to be short, the proxy server tries to close the persistent TCP connections and makes the resources free, so that the released resources be used for new TCP connections. For realizing the above-mentioned control method, the remaining resources of a proxy server should be monitored. We also have to maintain persistent TCP connections at the proxy server to keep and close them according to the monitored value of the utilization of the resources. The implementation issues for the management of persistent TCP connections will be discussed in Section 5. For the further effective resource usage, we also add a mechanism that the amount of resources assigned to the persistent TCP connections is decreased gradually after the connection becomes inactive. The socket buffer is not needed at all when the TCP connection is idle. Therefore, we gradually decrease the send/receive socket buffer of persistent TCP connections by taking account of the fact that as the connection idle time continues, the possibility that the TCP connection is ceased becomes large. In the next section, we evaluate the effect of the above algorithms through simulation experiments. 4 Simulation Experiments In this section, we evaluate the performance of our proposed scheme through simulation experiments using ns-2 [20]. Figure 4 shows the simulation model. In the model, the bandwidths of the links between the client hosts and the proxy server and those between the proxy server and the Web servers are all set to 00 Mbps. To see the effect of various network conditions, the packet loss probability on each link is randomly selected from 0.000, 0.0005, 0.00, 0.005 and 0.0. The propagation delay of each link between the client hosts and the proxy server is also varied from 0 msec and 00 msec, and that between the proxy propagation delay : 0 200 msec loss probability : 0.000 0.0 H r = 0.5 Web proxy server Web servers Client Hosts propagation delay : 0 00 msec loss probability : 0.000 0.0 # of client hosts : 50, 00, 200, 500 # of Web servers : 50 Figure 4: Network Model for Simulation Experiments server and the Web servers is from 0 msec and 200 msec. The number of Web servers is fixed at 50, and that of the client hosts is changed as 50, 00, 200 and 500. We ran 000 sec simulation in each experiment. In the simulation experiments, each client host randomly selects one of the Web servers and generates a document transfer request via the proxy server. The distribution of the requested document size follows [5], that is, it is given by the combination of a log-normal distribution for small documents and a Pareto distribution for large ones. The access model of the client hosts also follows that in [5], where the client host first requests a main document, and then requests some in-line images, which are included in the document after a short interval (following [5], we call it active off time), and then requests the next document after a somewhat longer interval (inactive off time). Note that since we focus on the resource and connection management of proxy servers, we have not considered detailed algorithms of the caching behavior at the proxy server, including the cache replacement algorithm. Instead, we set the cache hit ratio at the proxy server, h, to 0.5. Using h, the proxy server decides either to transfer the requested document to the client host directly, or to deliver it to the client host after downloading it from the Web server. The proxy server has 3200 KBytes of socket buffer and assigns it as send/receive socket buffer to the TCP connections. What this means is that the original scheme can establish at most 200 TCP connections concurrently since it fixedly assigns 6 KBytes of send/receive socket buffer to each TCP connection. In what follows, we compare the performances of the following four schemes; scheme () which does not use any enhanced algorithm presented in this paper scheme (2) which uses E 2 -ATBT scheme (3) which uses E 2 -ATBT and the connection management scheme described in Subsection 3.2, but does not use the algorithm that gradually decreases the socket buffer, i.e., the socket buffer size remains unchanged after documents are transferred scheme (4) which uses all of the proposed algorithms, that are E 2 -ATBT and the connection management scheme comprised of the algorithm that gradually decreases the size of the socket buffer assigned to the persistent TCP connections 6

Note that for schemes (3) and (4), we do not explicitly consider the amount and the threshold value of each resource explained in Subsection 2. (mbuf, file descriptor, control blocks, and socket buffer). Instead, we introduce N max (= 200), the maximum number of connections, which can be established simultaneously, to simulate the limitation of the proxy server resources. In schemes () and (2), newly arrived requests are rejected when the number of TCP connections in the proxy server is N max. On the other hand, schemes (3) and (4) forcibly terminate some of persistent TCP connections that are inactive for the document transfer, and establish new TCP connections. For scheme (4), we exclude persistent TCP connections from calculation process of the E 2 -ATBT algorithm, and halve the assigned size of the socket buffer every second. The minimum size of the assigned socket buffer is KByte. 4. Comparison of HTTP/.0 and HTTP/. Before evaluating the proposed method, we first compare the effects of HTTP/.0 and HTTP/. on the proxy server performance, to confirm the necessity of the connection management scheme proposed in Subsection 2.2. In Figure 5, we plot the performance of the proxy server as a function of the number of client hosts in schemes () and (2). Here we define the performance of proxy server by the total size of the documents transferred in both directions by the proxy server during an elapsed simulation time of 000 sec simulation time. It can be clearly observed from this figure that the performance of scheme (2) is improved as against scheme () in the both cases of HTTP/.0 and HTTP/.. It is because each TCP connection is assigned to a proper size of the socket buffer by E 2 - ATBT. Also, we show that the performance of the proxy server with HTTP/. is higher than that of HTTP/.0 when the number of client hosts is small (50 or 00). The reason is that the document transfer with HTTP/. uses the persistent TCP connection, which can avoid the three-way handshake for establishing a new TCP connection to treat successive document transfer requests. On the other hand, the proxy server performance in HTTP/. becomes further worse than that of HTTP/.0 when the number of client hosts is 200, which indicates that the proxy server load is high. This is because many persistent TCP connections at the proxy server are not in actual use and waste the resources of the proxy server. It results in that TCP connections for new document transfer requests cannot be established. However, since HTTP/.0 closes the TCP connection immediately after transferring the requested document, new TCP connections can be soon established even when the load of the proxy server is high. Thus, although the persistent TCP connection can improve the proxy server performance when the load of the proxy server is low, as the load becomes high it acts to significantly decrease the proxy server performance. This result explicitly indicates the need for careful management of persistent TCP connections at a proxy server, since the utilization of persistent TCP connections is very low as shown in Subsection 2.2. 4.2 Evaluation of Proxy Server Performance We first investigate the performance observed at the proxy server. In Figure 6, we show the performance of the proxy server as Total Transfer Size [MB] 5000 4000 3000 2000 000 0 HTTP/.0 : scheme () HTTP/. : scheme () 50 HTTP/.0 : scheme (2) HTTP/. : scheme (2) 00 200 # of Client Hosts Figure 5: Simulation Results: Comparison of HTTP/.0 and HTTP/. a function of the number of client hosts, where we change the length of the persistent timer of persistent TCP connections at the proxy server to 5 sec, 5 sec and 30 sec in Figures 6(a), 6(b), and 6(c), respectively. It is clear from Figure 6 that the performance of the original scheme (scheme ()) decreases as the number of client hosts become larger than 200. This is because when the number of client hosts is larger than N max, the proxy server begins to reject some of document transfer requests, although most of N max TCP connections are idle, which has been analytically shown in Subsection 2.2. It means that they do nothing but waste the resources of the proxy server. The results of scheme (2) in Figure 6 shows that E 2 -ATBT can improve the proxy server performance regardless of the number of client hosts. However, it also shows the performance degradation when the number of client hosts is large. This means that E 2 -ATBT is not enough to solve the problem of idle persistent TCP connections, and that it is necessary to introduce a connection management scheme to overcome this problem. We can also see that scheme (3) can significantly improve the performance of the proxy server, especially when the number of client hosts is large. It is because as the proxy server cannot accept all the incoming connections from the client hosts (which corresponds to the case where the number of client hosts is larger than 200 in Figure 6), scheme (3) closes idle TCP connections so that newly arriving TCP connections can be established. The result is that the number of TCP connections that actually transfer document is increased largely. Scheme (4) also can improve the performance of the proxy server, especially when the number of client hosts is 00 or 200. In the case of a larger number of client hosts (500 hosts), however, the degree of performance improvements is slightly decreased. It can be explained as follows. When the number of client hosts is small, most of the persistent TCP connections at the proxy server are kept established. Therefore, the socket buffer assigned to the persistent TCP connections can be effectively re-assigned to other active TCP connections by using scheme (4). When the number of client hosts is large, on the other hand, the persistent TCP connections are likely to be closed before scheme (4) begins to decrease the assigned socket buffer. It results in the case that scheme (4) can do almost nothing against the persistent TCP connections. Next, the effect of the length of the persistent timer is evalu- 7

scheme () scheme (2) scheme (3) scheme (4) scheme () scheme (2) scheme (3) scheme (4) Total Transfer Size [MB] 4000 3000 2000 000 0 50 00 500 200 # of Client Hosts Total Transfer Size [MB] 4000 3000 2000 000 0 50 00 200 500 # of Client Hosts (a) persistent timer: 5 sec (b) persistent timer: 5 sec scheme () scheme (2) scheme (3) scheme (4) Total Transfer Size [MB] 4000 3000 2000 000 0 50 00 200 500 # of Client Hosts (c) persistent timer: 30 sec Figure 6: Simulation result ated. We first focus on schemes () and (2) in Figure 6. In the case of 50 client hosts, the performance becomes higher as the persistent timer becomes larger. The reason is that in the case of the longer persistent timer, the successive document transfers can be performed by persistent connections and the connection setup time can be removed. In the case of a larger number of client hosts, however, the performance degrades when the persistent timer is large. This is caused by the apparent waste of the proxy server resources by resulting from the idle TCP connections. It can also be observed from Figures 6(a) through 6(c) that schemes (3) and (4) can provide much higher performance than schemes () and (2), regardless of the length of the persistent timer, especially when the number of client hosts is large. This is because schemes (3) and (4) can manage persistent TCP connections as expected. It means that schemes (3) and (4) can utilize the merits of persistent TCP connections in the case of smaller number of client hosts, and that they can avoid the proxy server resources from being wasted and assign them effectively for active TCP connections in the case of the larger number of client hosts. 4.3 Evaluation of Response Time We next show the evaluation results of response times of document transfer. We define the response time as the time from when a client host issues a document transfer request to when it finishes receiving the requested document. Figure 7 shows the simulation results. We plot the response time as a function of the document size in the cases of the four schemes. From this figure, we can clearly observe that the response time is much improved when our proposed scheme is applied especially for the large number of client hosts (Figures 7 (b)-(d)). However, when the number of client hosts is 50, the proposed scheme does not help improving the response time. In this case, the server resources are enough to accommodate 50 client hosts and all TCP connections are immediately established at the proxy server. Therefore, the response time cannot be improved so much. Note that since E 2 -ATBT can improve the throughput of TCP data transfer to some degree, the proxy server performance can be improved, as was shown in the previous subsection. Although schemes (3) and (4) can improve the response time largely, there is little difference between the two schemes. This can be explained as follows. Scheme (4) decreases the assigned socket buffer to persistent TCP connections and re-assign it to 8

Response Time [sec] 00 0 scheme () scheme (2) scheme (3) scheme (4) Response Time [sec] 00 0 scheme () scheme (2) scheme (3) scheme (4) 0. 0 00 000 0000 00000 e+06 e+07 e+08 Document Size [Byte] 0. 0 00 000 0000 00000 e+06 e+07 Document Size [Byte] (a) Number of Client Hosts: 50 (b) Number of Client Hosts: 00 Response Time [sec] 00 0 scheme () scheme (2) scheme (3) scheme (4) Response Time [sec] 00 0 scheme () scheme (2) scheme (3) scheme (4) 0. 0 00 000 0000 00000 e+06 e+07 Document Size [Byte] 0 00 000 0000 00000 e+06 e+07 e+08 Document Size [Byte] (c) Number of Client Hosts: 200 (d) Number of Client Hosts: 500 Figure 7: Simulation Result: Response Time other active TCP connections. Although the throughput of the active TCP connections becomes improved, its effect on the response time is very small compared with the effect of introducing scheme (3). However, scheme (4) is worth to be used at the proxy server, since scheme (4) can give a good effect on the proxy server performance as shown in Figure 6. 5 Implementation and Experiments In this section, we first show an implementation overview of our proposed scheme on an actual machine running FreeBSD 4.0. We then discuss some of the experimental results in order to confirm the effectiveness of the proposed scheme. 5. Implementation Issues Our proposed scheme consists of two algorithms; the enhanced E-ATBT, (E 2 -ATBT) proposed in Subsection 3., and the connection management scheme described in Subsection 3.2. In [], we have confirmed an effectiveness of the original E-ATBT algorithm through some experiments. To realize E 2 -ATBT, we need to implement new mechanisms that take account for the connection dependency and controlling receive socket buffer at the proxy server. Two methods can be considered for this purpose as described in Subsection 3.; monitoring the utilization of send/receive socket buffer, and adding some information to data/ack packets. As a first step, we are now implementing the latter method, using additional bits to identify the connection dependency and the size of send socket buffer at the Web server. Since this method requires cooperation between the server and client hosts, it is not a realistic solution. However, we can know a upper limit on performance that we can expect from using our 9

time scheduling list time scheduling list Client host Web proxy server WWW server ( sfd, proc ) # of users 50, 00, 200, 500 upper limit : 400 connections threshhold : 300 connections persistent timer : 5 seconds cache hit Ratio : 0.5 ( sfd, proc ) Figure 9: Implementation Experiment System NULL insert NULL delete Figure 8: Connection Management Scheme proposed method, by comparing it with the former observation method. We will include the additional results of the abovementioned algorithms in the final paper. To implement the connection management scheme, we have to monitor the utilization of resources at the proxy server, and maintain an adequate number of persistent TCP connections. Monitoring the resources at the proxy server is done as follows. The resources for establishing TCP connections in our case are mbuf, file descriptor, and control blocks, as described in Subsection 2.. The resources cannot all be changed dynamically once the kernel is booted. However, the total and remaining amounts of these resources can be observed in the kernel system. Therefore, we introduce threshold values of utilization for these resources, and if one of the utilization levels of those resources reaches its threshold, the proxy server starts closing the persistent TCP connections and releases the resources assigned to those connections. Figure 8 sketches our mechanism for managing persistent TCP connections at the proxy server. When a TCP connection finishes transmitting a requested document and becomes idle, the proxy server records the socket file descriptor and the process number as a new entry in the time scheduling list, which is used by the kernel to handle the persistent TCP connections. Note that a new entry is added to the end of the time scheduling list. When the proxy server decides to close the persistent TCP connection, it selects the connection from the top of the time scheduling list. In this way, the proxy server can close the oldest persistent connection first. When a certain persistent TCP connection in the list becomes active before being closed, or when it is closed by persistent timer expiration, the proxy server removes the corresponding entry from the list. All operations on the time scheduling list can be performed by using simple pointer manipulations. 5.2 Implementation Experiments We now present the effectiveness of our implemented scheme through some experiments. Figure 9 shows our experimental system. In this system, we implement our proposed scheme at the proxy server, running the Squid proxy server [9] on FreeBSD 4.0. The amount of proxy server resources is set such that the proxy server can accommodate up to 400 TCP connections simultane- ously. The threshold value, at which the proxy server begins to close persistent TCP connections, is set to 300 connections. The proxy server monitors the utilization of its resources every second. We intentionally set the size of the proxy server cache to be 024 KBytes, so that the cache hit ratio becomes 0.5. The length of the persistent timer is set to 5 seconds. The client host uses httperf [2] to generate document transfer requests. As in the simulation experiments, the access model of each user at the client host (the distribution of the requested document size and think time between successive requests) follows that reported in [5]. When the request is rejected by the proxy server, due to lack of resources, the client host resends the request immediately. We will compare the proposed scheme and the original scheme in which no mechanism proposed in this paper is used. In each of experimental results presented in the below, the average value of ten times experiments will be shown. 5.2. Evaluation of Proxy Server Performance We first evaluate the proxy server performance. In Figure 0, we compare throughput values of the proposed scheme and the original scheme as a function of the number of users at the client host. Here we define the throughput of the proxy server as the total number of documents sent/received by the proxy server, divided by the experiment time (0 minutes). It is clear that when the number of users is large, the throughput of the proposed scheme is larger than that of the original scheme, whereas both provide almost the same throughput when the number of users is small. When the number of users is small, the proxy server can accommodate all users TCP connections without lack of the proxy server resources. When the number of users becomes large, the original scheme cannot accept all of the document transfer requests since many TCP connections established at the proxy server are idle and occupy the server resources, as described by the analysis in Subsection 2.2. On the other hand, the proposed scheme can accept a larger number of document transfer requests than the original scheme, since the proxy server forces the idle TCP connections to close when the proxy server resources become short and assigns the released resources to newly arriving TCP connections. Table summarizes the number of TCP connections established at the proxy server for document transfer during the experimental run. When the number of users becomes large, the number of established TCP connections is increased by using the proposed scheme. It results in an increase of the server throughput as explained above. 0