Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station

Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station Abstract With the growing use of cluster systems in file distribution, web servers and database transactions, efficiency and power optimization have realized a lot of significance. Distributor based systems has been widely adopted, which forward the requests from the clients to a set of waiting balanced servers in complete transparency to the clients. The policy employed in forwarding the requests from the front-end distributor to the backend servers plays a very important role in the overall performance of the system. In this paper, we use power and locality based request distribution, which aims to provide optimum energy conservation, while maintaining the required QoS of the system. We use a basic locality policy, which distributes the incoming requests to the backend servers based on the partitioning of the data set among the backend servers memory. It aims to generate more number of hits in the backend server s memory by data specific distribution. This policy provides the required efficiency. We then implement an optimum on-ff power policy on top of this locality-based distribution, to achieve considerable energy conservation. The whole system is made to work under a standard set of memory management policy, which improves the performance further. We back the idea with simulation results and future implementation possibilities. 1. Introduction Cluster systems are being increasingly used in the web server management, file distribution and database transaction. The main reason for the large-scale deployment of the cluster systems is the well-established distributor based request management technique. The system based with a distributor has a front-end server (distributor), which receives all the requests from the clients. The requests are then forwarded to the bunch of backend servers that contain the actual content for the clients. The requests are forwarded to the backend servers based on various policies. The forwarding of the requests from the distributor to the backend servers is carried out in complete transparency to the clients. A handoff protocol is employed in most cases to make the transition smooth and transparent. The operational power budget for maintaining a large cluster may run into millions of dollars. Thus, any viable approach for energy saving in a cluster should be considered seriously. Hence the paper tries to achieve a balance between achieving high efficiency and optimum power conservation. Of the various policies that are used to forward the requests from the distributor to the backend servers, the weighted round robin, locality-based request distribution (LARD) [1] and power aware request distribution (PARD) [2] are the most common and successful. Each has their own set of pros and cons. The effective use of the policies in such a way that the system is most efficient would form a high performance policy for the distributor. We strive to achieve such a policy.

In this paper, we make advantage of the Locality aware request distribution (LARD) to achieve high efficiency, in terms of throughput (number of requests served per second) and average response time (service time). Then, we implement a simple and optimum power policy over the LARD to make it more power efficient and as well as content efficient. We then present simulation results to show the efficient working of our algorithm. The rest of the paper is organized as follows: In Section 2, previous related work is addressed. It gives a brief description of the already existing policies for the distributor. Section 3 explains our policy in detail. Section 4 presents the details about the simulations that have been carried out and the pseudo code representing our algorithm. The section 5 enumerates the results by comparing the performance of our system against the various other policies. We then conclude the paper with section 6. 2. Related Work Of the various policies that are employed at the distributor, the policies that provide better load balancing among the backend servers, better efficiency and considerable power conservation are the most preferred. In this section, we would review the following policies: Weighted Round Robin (WRR) Locality Aware Request Distribution (LARD) Power Aware Request Distribution (PARD) 2.1 Weighted Round Robin The choice of the policy is very critical for the efficient operation of the system. The weighted round robin policy is applied based on the current load at the backend servers. The policy is applied at the distributor, where the requests are forwarded to the backend servers. The distributor maintains the record of the current load at the backend servers and it forwards the request from the client, based on this information. The request is forwarded to the least loaded backend server among the bunch of servers. The request forwarding is thus weighted based on the current load on the servers. The server that is most loaded is relieved off the load by forwarding the requests to the least loaded server. So, at any given point of time, the load is evenly balanced among all the available servers and thus providing very good load balancing.

Figure 1: Weighted Round-Robin The main drawback of the system is that it does not concern about the locality of the requests and the power conservation among the servers. In case of large deployment of cluster systems, the power consumed becomes a very significant factor to be considered. Since all the servers are turned ON during the entire period of operation, the system turns out to conserve zero power. Also, as the system does not consider the locality of the data among the backend servers, the different data requests land up in different servers and incur large disk latencies. This increases the response time (service time) of the servers and hence the throughput. Considering this, power and locality based request distribution policies have more significance. 2.2 Locality Aware Request Distribution The major drawback of the Weighted Round Robin policy is that it incurs a large amount of disk latency by not considering the locality of the data in the backend server s memory. Our simulation of the Weighted Round Robin policy shows that during worst case, this could generate an unacceptable amount of disk latency in the backend server and lead to increased response time of the server and reduced throughput. This causes large delays to be experienced by the clients and brings down the performance of the whole system. To overcome this, Locality aware request distribution [1] employs locality based distribution policy at the distributor and strives to increase the memory hits at the backend server s rather than the disk latency. Figure 2: LARD

The distributor maintains a table of the data types available at the backend servers memory. The data types are assigned to the backend servers based on the initial server/data partitioning and are initially distributed evenly across the servers. When a new request arrives at the distributor, its data type is looked up in the distributor table and the corresponding server is identified. The request is forwarded always to that server for that particular data type. By this assignment, the request will incur disk latency only during the first initial assignment to that backend server. The consecutive requests of the same data type end up as server memory hits, since it has already been fetched from the disk and is now in the memory. Once the requests starts overflowing at one of the servers, one of the least loaded servers is added to serve that data type and the server set for that data type starts growing. Similarly, when a server becomes underutilized, a server is removed from the server set. However, when there is no change in target server set for given K seconds, the most load server is removed from the server set. This ensures load balancing up to some extent. But, the major drawback of the LARD system is that it too does not take into consideration the power factor and has all the backend servers running. This makes the power conservation zero and is the least power efficient. Also, the load balancing of the LARD system is not as good as the Weighted Round Robin policy. 2.3 Power Conservation by Multi-speed disks E. V. Carrera et al., [4] have proposed a technique to conserve energy in network servers using a multi-speed disk technology. The idea is to use two disks with different speed to emulate a multi-speed disk. Thus multi-speed disk concepts need to be applied to achieve this kind of optimization. When the load is hits higher than a pre-defined threshold, the high-speed disk serves it, and vice versa. Thus, this approach can save energy consumption up to 23%, in comparison to conventional servers. They also argue that the performance degradation is very negligible. A major setback to this approach is that it requires multi-speed disks, which are not very popular nowadays. 2.4 Dynamic Cluster Re-configuration E. Pinheiro et al., [5] have proposed a dynamic cluster reconfiguration technique., to bring down the energy consumption in servers. In this technique, a cluster node is dynamically added or removed to the cluster system based on the following constraints: Efficiency and performance of the system Other power implications Thus, the cluster is dynamically re-configurable and proves to be intelligent to reconfigure itself based on the load and other system efficiency related parameters. The paper shows the results of the implementation of this algorithm on a real time cluster system. The results show the power and energy consumption to be reduced considerably up to 71% and 45%, respectively, when compared to the traditional network servers.

2.5 Power Aware Request Distribution The drawbacks of WRR and LARD regarding power consumption is that both of them always have their servers turned ON even though some of them do not serve any requests. Therefore, they cannot conserve any power. Unlike WRR and LARD, PARD takes a great consideration to reduce power consumption of the cluster system, and hence, PARD [2] policy is the most efficient policy in terms of power saving among the three policies. Figure 3: PARD PARD employs the ON-OFF Model; any backend server, which is idle, is turned off, and backend servers are turned on whenever they are required to serve requests. In [2], they assume that power is equal to its maximum power when it is ON and simply zero when it is OFF. Turning off unused server is considered to be the best way to save power. Figure 4: PARD Policy However, this policy brings more number of disk latency than LARD. Whenever backend servers are turned ON, the data stored in the backend server s memory is completely deleted. This causes a lot of disk latency as distributor forwards requests to the backend servers. Furthermore, there are Startup delay and Shutdown delay when backend servers are turned ON and OFF, respectively. While the Startup delay results during the booting up of the Operating System, the Shutdown delay is a period between pruning the idle backend server from service mode and shutting it down. These delays cause worse QoS as compared to LARD. On overall, there are trade-offs between QoS and power savings.

3. Power and Locality-Aware Redirect Distributor (PLARD) The basic PLARD system is basically a combination of LARD and PARD. This ensures that both power conservation and locality improve the performance and optimum power conservation. The Power policy is implemented over the Locality based policy to provide better power conservation and higher efficiency. The distributor forwards the requests to the backend servers that are turned ON, based on the locality of the incoming data. Figure5: PLARD In PLARD, the distributor sends requests only to the powered-on backend servers, depending on content-based distribution. While the distributor transmits the requests to the backend servers, PLARD checks if there are any idle backend servers. If it happens that some of the backend servers are idle (which is very much possible in traditional cluster based systems), the idle backend servers will be turned OFF for power conservation. This is a simple ON-OFF policy, which achieves very good power conservation. On the other hand, it still maintains high QoS like the LARD policy as it employs the content-based distribution. However, this basic PLARD policy still has the drawback in the form of delay as PARD. There are always Startup and Shutdown delay when the backend server becomes ON and OFF, respectively. Therefore, we propose an improvement in the form of PLARD with Prediction. This policy predicts the oncoming congestion on the turned ON servers and initiates a turned OFF server to turn ON, well in advance of the impending congestion. This prevents the overflow at the other backend server and eliminates the Startup and Shutdown delays that are otherwise incurred to the incoming requests. One other feature of the power policy is that atleast one server is maintained ON always. This ensures that there are no startup and shutdown delays associated with the future requests, when all the current requests are quenched out and the servers are turned off. 3.1 Memory management policy One other important and unique feature of the PLARD policy is the pin-down memory. The pin-down memory is the part of the main memory exclusively reserved for the web, mail or file server running on the backend server. The number of data-types supported by a server at a given point of time is directly proportional to the amount of pin-down memory available. This ensures that any other program running on the server does not occupy the memory completely and remove the data-types under request from the memory. This enhances the performance of the system by many folds. The memory management policy works as follows:

Pick a server based on the PLARD policy which is described later in the paper If the data-type is new and there exists no server associated with it, select a server which has enough pin-down memory to support the new data-type If a server is about to reach its maximum capacity, migrate the data-type to a new server which is capable of accommodating the data-type in its pin-down memory The pin-down memory area is dynamically adjustable and can be varied anytime, depending on the load on the system With these performance enhancement features, PLARD with Prediction and memory management has a considerable drop in the power conservation. The PLARD policy alone produced a power conservation of 67%, but with the above enhancements included the power conservation dropped to 50%. This is because, the powered-off backend server is turned on earlier to compensate the Startup delay and it consumes more energy since it is turned on before it really does serve the requests. Nevertheless, PLARD with prediction and pin-down memory proves to be the most optimum distribution policy among the rest. 4. Simulation We have built the simulators and run simulations to back our idea. Our work consists of the build of the WRR, LARD, PLARD and PLARD with Prediction simulators and extensive simulations on these simulators. The simulation model consisted of 100 random requests served from a trace file, which emulates the client requests, one distributor frontend server that collects the requests and forwards to the backend servers and 6 backend servers that provide the service. The policies are implemented at the front-end distributor. The backend servers are assumed to be capable of serving a maximum 30 connection at any given time. The pin-down memory size is fixed to 1024 Bytes (for testing purposes). The value of T low and T high are fixed to be 12 and 17 respectively. 4.1 Pseudo-Algorithm: Read requests from the trace file; Check for the least loaded server, which is turned ON; Apply the memory management policy; Forward the request to that backend server; Update the distributor_table at the front-end about the locality of the data; If (any_server in the backend cluster.load > threshold) Turn ON a new server; Else Go to step 1

5. Results The simulations were run for all the 4 policies with the same trace file and the various performance metrics have been calculated. 2000 1900 1800 1700 Average Response Time 1600 1500 WRR LARD LARD-M PLARD Figure 6: Average Response Time From the above bar chart it is clear that the WRR suffers the worst ART as it incurs a lot of disk latencies due to inefficient locality awareness. The rest of the policies perform almost to the same mark. The PLARD has slightly more ART as it incurs the 30-second startup delay on to the incoming requests. 47 46.5 46 45.5 45 44.5 44 43.5 43 42.5 42 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr North Figure 7: Overall Throughput The Overall throughput is a direct reflection of the same reason stated above.

100 80 60 40 Memory Hits 20 0 WRR LARD LARD-M PLARD Figure 8: Memory Hits The memory hits too are way better in the locality based policies as they adopt a highly efficient locality based algorithm. But, the load balancing is better with the WRR policy. 50 40 30 20 Power Conservation 10 0 WRR LARD LARD-M PLARD Figure 9: Power Conservation The power conservation can be seen to be high in the PLARD simulators as they use the ON-OFF mechanism to conserve the power. The servers are always turned ON in the WRR and PLARD and hence provide zero power conservation.

196 2000 1000 0-1000 -2000-3000 -4000-5000 -6000-7000 -8000-9000 WRR LARD LARD-M PLARD Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Figure 9: Memory Management The above is the histogram representing the available memory after simulation on each of the servers on all the systems. A negative value shows that the server has discarded the contents on the memory to accommodate new requests. A very high negative value implies a poor memory management scheme. This is the reason for poor performance on those systems. 6. Conclusion As the use of cluster system increases, conserving power has been a critical issue. A variety of policies employed in forwarding the requests from the distributor to the backend servers are addressed, for they vitally affect the overall performance of the cluster system. In this paper, we compare four different policies: WRR, LARD, PLARD and PLARD with Prediction, so as to compare which policy is the best for both power conservation and performance. WRR has a good load balance, but its locality is so poor that it increases miss rates. In order to reduce the miss rates and improve secondary storage scalability, LARD is used. However, WRR and LARD save zero power in the cluster system. Thus, we propose PLARD that employs not only content-based request distribution, but On-Off policy to maintain minimum required QoS and saves power; and distributes the incoming requests to the backend servers based on the type of the requests. This content-based request distribution provides locality and high hit rates. Meanwhile, it turns off any idle backend servers to achieve significant power conservation. Even though PLARD with Prediction is also taken in consideration to reduce Startup and Shutdown delay in PLARD, it consumes more energy than PLARD as shown in our results. Besides, there is significant

difference in Average Response Time, Overall Throughput and Memory Hits between PLARD and PLARD with Prediction (when 30 seconds of startup delay is considered). Therefore, PLARD with prediction happens to be the best policy among the other policies in terms of QoS and power conservation. In the future, we will implement real cluster system and apply our PLARD onto it. We would also implement the pin-down memory configuration in the backend servers. References [1]. Vivek S. Pai, Mohit Aron, Gaurov Banga, Michael Svendsen, Peter Druschel, Willy Zwaenepoel, Erich Nahum, Locality-aware request distribution in clusterbased network servers, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.205-216, October 02-07, 1998, San Jose, California, United States [2]. K. Rajamani and C. Lefurgy, On evaluating request-distribution schemes for saving energy in server clusters, in Proc. Intl. Sym. Performance Analysis of Systems and Software, March 2003. [3]. Mohit Aron, Darren Sanders, Peter Druschel and Willy Zwaenepoel, "Scalable Content-aware Request Distribution in Cluster-based Network Servers, In Proceedings of the USENIX 2000 Annual Technical Conference, San Diego, CA, June 2000. [4]. E. V. Carrera, E. Pinheiro, and R. Bianchini, Conserving Disk Energy in Network Servers, in Proceedings of the 17 th Annual International Conference on Supercomputing, pp. 86-97, June 2003. [5]. E. Pinheiro, R. Bianchini, E. V. Carrera and T. Heath, Dynamic Cluster Reconfiguration for Power and Performance. Kluwer Academic Publishers, 2002. [6]. E. V. Carrera and R. Bianchini, "Improving Disk Throughput in Data-Intensive Servers". Proceedings of the 10th International Symposium on High-Performance Computer Architecture (HPCA 10), February 2004. [7]. E. V. Carrera, S. Rao, L. Iftode, and R. Bianchini. "User-Level Communication in Cluster-Based Servers". Proceedings of the 8th IEEE International Symposium on High-Performance Computer Architecture (HPCA 8), February 2002.