Introducing Network Delays in a Distributed Real- Time Transaction Processing System

Association for Information Systems AIS Electronic Library (AISeL) AMCIS 1996 Proceedings Americas Conference on Information Systems (AMCIS) 8-16-1996 Introducing Network Delays in a Distributed Real- Time Transaction Processing System Samir Gupta Krannert Graduate School of Management, Purdue University, guptas@veda.mgmt.purdue.edu Alok Chatturvedi Krannert Graduate School of Management, Prudue University, alokrc@mgmt.purdue.edu Follow this and additional works at: http://aisel.aisnet.org/amcis1996 Recommended Citation Gupta, Samir and Chatturvedi, Alok, "Introducing Network Delays in a Distributed Real-Time Transaction Processing System" (1996). AMCIS 1996 Proceedings. 188. http://aisel.aisnet.org/amcis1996/188 This material is brought to you by the Americas Conference on Information Systems (AMCIS) at AIS Electronic Library (AISeL). It has been accepted for inclusion in AMCIS 1996 Proceedings by an authorized administrator of AIS Electronic Library (AISeL). For more information, please contact elibrary@aisnet.org.

1.0 Introduction Introducing Network Delays in a Distributed Real-Time Transaction Processing System Authors: Samir Gupta, Ph.D. Visiting Assistant Professor Krannert Graduate School of Management Purdue University, West Lafayette, IN 47907 (guptas@veda.mgmt.purdue.edu) Alok Chaturvedi, Ph.D Associate Professor Krannert Graduate School of Management Purdue University, West Lafayette, IN 47907 (alokrc@mgmt.purdue.edu) Many transactions performed by database applications are critically dependent on time, wherein tasks must be completed within a stipulated time deadline. Once the deadline has elapsed, some transactions may have a zero or even a negative economic value for the system. These are called transactions with real-time constraints or real-time transactions, for short. Consider Program Trading or a computer initiated stock trading operation. Here a computer is programmed to initiate a trading operation if it detects a difference in price of the same stock in two different markets. The window within which the difference exists, lasts for a short interval of time. The trading operation is useful only if it is completed within this time window, after which the trade is worthless. Computer Integrated Manufacturing is another example of real-time transaction processing (TP). It involves controlling CNC machines by detecting operational anomalies and rectifying them in real-time. In a Flexible Manufacturing System, these machines would be required to perform a series of complex transactions during the time the product is on that machine. A common design objective for a hard real-time transaction processing systems is to minimize the number of transactions that are completed after their deadline have expired. On the contrary, conventional DBMS design tries to maximize the average throughput for a set of transactions. An important feature of the design of a conventional database is the need for all transactions to be executed at some point of time. However in a real-time database, some transactions are deliberately aborted if they cannot meet their required deadlines. In this situation, detecting tardy transactions early and aborting them before they consume excessive computational resources, greatly improves the performance of the system. There are a large number of factors that govern the behavior of such systems. The performance of a Real- Time Distributed Transaction Processing System (RDTPS) is dependent on a complex interplay of these factors. The present research is undertaken to develop a general purpose simulation system and use it to analyze performance of a RDTPS under different design choices. In particular, this paper is devoted the analysis of the impact of transmission delays within a network on the performance of a RDTPS. Intuitively, it appears that system performance will decrease monotonically with increase in network delays. The paper aims to confirm or reject that hypothesis. The rest of the paper is divided as follows: Section 2 describes related work in the areas of distributed computing, real-time systems and transactions processing systems. Section 3 describes the simulation model including the design of experiments. Section 4 concludes with the results and discussion.

2.0 Related Work A significant amount of research has been conducted both in the area of real-time systems and in distributed transaction processing systems. The primary methodology of research in these areas has been one of three approaches: Analytic, wherein the researchers has attempted to create a mathematical model of the system and has drawn inferences from it. [RN94] has studied the problem of data allocation in a distributed database (DDB). [RC89] have proposed a high level architecture for DDB design. [W87] have studied the overheads of locking and commit protocols in DDB. Mechanistic, where the researchers have proposed new mechanisms for one or more components of the system and have argued the advantages of different mechanisms under different 'environmental' conditions. A very large amount of such literature exists for distributed transaction processing design. [SR94] have surveyed entire literature for real-time computing. [RS94] and [YWS94] have discussed issues pertaining to scheduling algorithms under real-time constraints. Many others have studied different protocols for single site TP systems with real-time constraints. Simulation based studies, wherein the researchers have attempted to evaluate the performance of DDB and real-time systems with different design policies. [AG88], [AG88A], [AG90], [AG92] discuss the performance of different scheduling policies in a single site TP system with real-time constraints. The above survey of literature indicates that most research has been either done for distributed systems without real-time constraints or for single site real-time systems. Real-Time distributed database systems have not been studied extensively so far and the paper attempts to fill that lacuna. It builds on a simulation environment described in [CG95] to study the performance of RDTPS. Concurrency Control Mechanism Priority Assignment Concurrency Conflict Resolution Scheduling Algorithm Update Ratio 25 % Centralized Locking FIFO Blocking All Table 1 : System Configuration 3.0 Database Model The system simulates a multi-site disk based database system. The database system is assumed to consist of several data servers or nodes. Each node has both computing and storage resources. The data, at each node, is logically arranged as tables. Data is fully replicated at all sites and we assume that a transaction reads data from the local node and writes on all nodes. Transactions may arrive at any participating node. Each transaction has a given release time and a deadline before which it needs to be completed. The simulation environment maintains three queues for each node; the CPU queue, IO queue and network queue. When a transactions needs to use a resource (e.g., the CPU), it enters the CPU queue and waits to be processed. The process completion time and the selection of which transaction to schedule next is determined by the design policies.

System performance for such a system is measured through two parameters - number of OK transactions defined as the number of transactions that are completed prior to their deadline ; and throughput defined as the number of OK transaction divided by simulation time. Each experiment set consists of 20 different experiments with different seed values. In each experiment the system is monitored for a total of 600 incoming transactions out of which the first 200 are ignored to bring the system to a steady state. 4.0 Results and Discussion In the preliminary set of experiments, the performance of the system was observed under no network delay and LAN delay conditions for a 4 node system. Network delays have been observed to vary significantly depending on the size of messages. For the entire exercise, LAN delays were assumed to be derived from an identical distribution - i.e., all message delays had the same distribution. Figures 1 and 2 depict the number of OK transactions and throughput observed for a 4 node configuration for the conditions in Table 1. (For a detailed description of the design policies, refer to [AG88]. Figures 1 & 2 : Impact of Network Delays for small/large messages The charts suggest that performance of the system improves if the network delays increase from zero to delays for a large message (average of 8 milliseconds). Furthermore, the gap between the performances increases with increase in the arrival rate of incoming transactions. This is an unexpected result and needs to be explored further. In order to understand this anomalous behavior, a set of experiments were conducted that observed the impact of various network delays on a 4 node system with high arrival rate

(1.6 transactions per second). The network delay was varied from 10-4 to 10-2 seconds per message and the performance was observed under all scheduling algorithms. Figures 3 and 4 present the results of these experiments. Figures 3 & 4 : Performance of system under different conditions of Network Delay and Scheduling Algorithms Scheduling Algorithm CPU Queue IO Queue All 0.49 0.66 Feasible 0.03 0.27 No Tardy 0.04 0.37 Table 2: Average utilization of CPU and IO under different Scheduling Policies The graph of both throughput and number of OK transactions exhibit a 'bell shaped' curve i.e., the performance rises for a small increase in network delay before dropping when all transactions are scheduled. For other scheduling policies the performance is steady within simulation errors. These findings suggest that network delays appear to offset the load on system processes like the CPU and the IO. A number of possible phenomena can be associated with this behavior. Some of them are instability in the system, 'temporary instability' caused by high variance in interarrival times of transactions, impact of mix of incoming transactions etc. To eliminate the reason of instability in the system, the average 'work' in the system for a set of conditions was computed. 'Work' is defined as the time for which the process was busy during the entire duration of the experiment. Table 2 presents the maximum work values for arrival rates of 1.6 transactions per second for different network delays. Each sub-action in the CPU and the IO queue require 26 ms and 121.5 ms respectively. (These figures have been chosen to correspond with the parameters in Abott and Garcia-Molina, 1992). Since the utilization of the system processes is less than 1, the system is likely to be stable. To confirm the results, this experiment was conducted for different number of

transactions. The length of queues in CPU and IO remained constant within simulation bounds confirming the stability analysis. Temporary instability can be caused by a high variance in arrival rates that cause a build up of queues of transactions waiting for system processes from time to time. Since each transaction has a time window within which it needs to be completed, temporary instability can significantly reduce system performance. The next set of experiments were conducted with the interarrival rates following a uniform distribution having the same mean (corresponding to 1.6 transactions per second) but different coefficient of variation (standard deviation /mean). The results showed that increasing variability has a significant negative impact on the system performance. Increasing variability in arrival times directly relates the increasing variability in queue lengths that creates pockets of 'temporary instability'. This causes transactions to get delayed beyond deadlines and therefore reduce system performance. (References available on request)