Assignment 3 Architecture Enhancement for PostgreSQL

Size: px

Start display at page:

Download "Assignment 3 Architecture Enhancement for PostgreSQL"

Mavis Benson
5 years ago
Views:

1 CISC 322 Fall 2010 Assignment 3 Architecture Enhancement for PostgreSQL PopSQL Andrew Heard andrew.heard@queensu.ca Daniel Basilio djdb@queensu.ca Eril Berkok ejb@queensu.ca Julia Canella jcc5@queensu.ca Mark Fischer mark.fischer@queensu.ca Misiu Godfrey mg5@queensu.ca

2 Page 1 of 15 Contents Abstract... 2 Intro... 2 Changes to Conceptual Architecture... 3 Changes to the Server Processes... 4 Load Balancer Subsystem... 5 States of Machines... 6 Changes to Data Access... 7 Software Architecture Analysis Method (SAAM) Use Cases Testing Lessons Learned Limitations Conclusions References... 14

3 Page 2 of 15 Abstract This report proposes a modification to the PostgeSQL architectural system in order to effect distributed processing of queries. Two major changes would need to be enacted in order to realize this new system, including, the introduction of a Load Balancing subsystem to help in effective distribution of the processing workload, and a Data Sync subsystem to distribute data access and maintain data location synchronicity. Intro One of the commonly cited shortcomings of the PostgreSQL system is the inability of the system to scale out query processing. When a database is set up, the postmaster, which is the main source or contact for the user, must be on the same machine as all query processors. This means that whether there is currently one user or a thousand users querying the databases associated with that cluster, they are all being processed on the same machine. This poses an evident problem that a PostgreSQL database has a limit on how many users it could service (dependent on how many users that one computer could hold). This is a major problem that must be examined in great detail in order to pose a solution. A curious fact is that currently, the database has the ability to scale out its database cluster 5. The data that the user might be trying to access could be spread out on as many different hard drives as available to the user. As long as the access subsystem in the main computer is able to find all of the drives, the information could indefinitely be scaled out, horizontally. Using the same practices that PostgreSQL uses for scaling out the database, the query processing could also be scaled out. However, there are several different issues that this change causes that need addressing. The first major change to the system would require a scaling out of how PostgreSQL assigns incoming connections. While the Postmaster currently hands off connections to local processes, it would now need to forward client connections to different machines. It would also have to deal with the issue that it could not pass off the connection to the processing machine because the user would have to first disconnect from the Postmaster machine. In addition, it would need some way to determine which machine it should pass incoming workloads to, how to monitor machine usage, and ways to deal with a machine death. To accomplish these goals a new Load Balancing subsystem was created which will be examined in detail. The second major change is to maintain data synchronicity among all machines. Currently, PostgreSQL supports storing data on multiple machines through the use of a single, local access subsystem. Maintaining this system with distributed Backend Instances would require all processing machines to access a single machine for all data accesses. This would cause an unacceptable networking bottleneck. The solution would be to distribute the access subsystem among multiple machines, with each machine having access to only its local data. Access to non-local data would require that a machine contact a separate machine s access remotely and request the data stored there. This distribution greatly reduces concurrency and replication issues at the

4 Page 3 of 15 expense of keeping all machines aware of data location. There is also the issue of maintaining a location listing for all data, not only on the local machine, but across the entire database. This requires a new subsystem to deal with synchronicity issues. To this end, a Data Synchronization (or Data Sync) subsystem was created to help remedy this issue. Changes to Conceptual Architecture The following diagram demonstrates the changes to the conceptual architecture that would be necessary to implement the proposed changes. Due to the fact that conceptual architecture does not show if a system is duplicated or not, more diagrams will be needed later on to demonstrate how the changes would take place on a structural level. Figure 1: New Conceptual Architecture Impact on Conceptual Architecture

5 Page 4 of 15 Figure 1 outlines changed subsystems in red and creates new subsystems in yellow. The impact that changes have on the architecture are kept to a minimum through separation of functionality. Backend Instances (responsible for parsing, rewriting, and executing queries) are not impacted by the proposed improvement and should remain identical. The Data Synchronization and Load Balancer systems are both new to the conceptual architecture. Data Synchronization is responsible for updating the dataaccess catalog. It is also responsible for getting statistical updates from other machines. To this end, the Data Synchronization system must interact with the Access Manager and Administration and Monitoring systems. The Load Balancer is responsible for redirecting connections to different machines. It must also be able to tell the Postmaster to create Backend Instances. The Remote Client Interface requires added functionality to be able to force clients to reconnect to a new machine. The Postmaster doesn t change, but must now receive instruction from the Load Balancer instead of directly from the Client Communication Manager. The Access Manager contains the largest change. It must check the appropriate catalog table to determine where data lives. It needs to be able to detect when data might be out of date and tell the Data Synchronization system to do an extra data refresh. It must also be able to communicate with Access Managers on other systems. The Administration and Monitoring subsystems must be updated and tweaked to take account of the new parallelism across systems. New functions that allow the Data Synchronization subsystem to update statistics must also be added. Changes to the Server Processes Currently, PostgreSQL handles incoming connections through the Postmaster, which then creates new Backend Instances (one per client) on the local machine. In the new implementation, the client communications manager first contacts to the Load Balancer to ask it to connect to a Backend Instance, whether on the local machine or a scaled out machine. To do this, the Load Balancer needs to take over some responsibility of the postmaster. These responsibilities include listening for connections, deciding which machine should handle a new user, and then connecting the user to that machine. There is one slight issue with this process: that a user cannot be connected to a different machine without first disconnecting from the Load Balancer. This functionality has been written into Libpq. Specifically, the user first connects to the Load Balancer and awaits an address for another machine, or a Backend Instance on the local machine. Once the address was given, Libpq then disconnects from the Load Balancer and connects to the new address. The user never has to know that this is occurring. Listening for user connections would work in much the same way as in the original system. Other steps, such as connections, would need to be rewritten to deal with the implemented changes.

6 Page 5 of 15 Load Balancer Subsystem Machine Selection The most important factor in deciding which machine to assign a request to would be the current workload of each machine. Workload can be measured in several different ways, such as by number of user connections to that machine. However, it is possible to have a thousand users connected to one machine that process one query each an hour, and therefore are quite a small workload, while another machine with one user connected processes over a thousand queries an hour. A better measure of workload would be in terms of the CPU usage of each machine. This will more accurately identify the kind of stress a machine is currently under. In order to decide which machine to assign Backend Instances to, the Load Balancer requires state information on all available machines. The state of the machine would show whether it was a processing or data machine, and also what the current CPU usage of the machine was. Depending on current workload and configured settings, the Load Balancer could hand requests off to the other machines, or if need be, transform a data machine to processing machine or vice versa. The Load Balancing subsystem would have to gather usage information to be able to make optimal choices. Each machine capable of processing queries must periodically send its CPU usage data, via new functions in the Data Sync subsystem. This information does not have to be one hundred percent up to date, but should be relatively frequent so that the Load Balancer can make informed decisions about the load of its other machines. Based on the state information, the Load Balancer will distribute work among those available machines which have the lowest current workload. If the Load Balancer finds that all available machines are overworked, it can signal a data machine to change its state to a processing machine to help manage the load. When this happens, the new processing machine streams all the current state information from any other machine which is already a query processing machine. Once it has done this, it sends a signal to the Load Balancer to tell it that it is ready to process queries. From this point forward, the machine keeps up to date using the Data Sync subsystem. Conversely, if there is a particularly small load, machines can be reverted to storage only at which point the Load Balancer will no longer send it new users to service and it no longer needs to keep its shared information up to date. Thresholds If a machine is critically overworked, beyond the point of being able to handle its load, it can request the Load Balancer to move assigned Backend Instances from that machine to another, less stressed machine. This is implemented as a fall-back, because it should never reach this point thanks to the new thresholds involved. If the machine reaches a certain percentage of CPU usage, the Load Balancer will stop handing it requests. Unless the current users connected have a spike in use, the machine will never be critically overworked. The thresholds for defining when it is at maximum capacity and when it is overworked should be set far enough apart as to keep the chances slim. For example, having a maximum workload of 75% CPU usage and the

7 Page 6 of 15 critical workload at 95% would keep the probability of this occurring small. While necessary, the process of transferring a Backend Instance to another machine would be cumbersome, as it requires signalling the client to re-connect to the new machine, and should be avoided when possible. Creating a Backend Instance and Establishing a Connection Once the Load Balancer has determined the most appropriate machine on which to create a Backend Instance, it will communicate with the Postmaster on the remote computer. The Postmaster will perform the creation operation using the information forwarded from the client through the Load Balancer. This functionality remains unchanged from the original PostgreSQL source code with the Load Balancer appearing as a client to the remote Postmaster. The Postmaster would then return the connection information to the Load Balancer, which would pass it back to the client. The load balancer can then disconnect from the Postmaster and the client can establish a connection with the Backend Instance. See figure NUMBER. The client to Backend Instance interface remains unchanged from the original PostgreSQL source. This means that client software does not need to be aware of the new distributed system. Figure 2: Load Balancer interaction with Server Processes on other machines States of Machines Distributing the processing workload amongst multiple machines has the added effect of giving machines discrete states. Originally, there was one machine that could

8 Page 7 of 15 process queries, and every other machine in the system was dedicated exclusively to data storage. With the addition of the Load Balancer, machines are now in one of a set of states, and can change states over time. This system adds the benefits of having fewer machines act as single points of failure. If a machine dedicated exclusively to processing dies, it can be detected and replaced by the Load Balancer. On the downside, only processing exclusive machines are replaceable, as the main machine needs to act as a single initial connection point for all clients and the lack of redundancy among storage machines means that data cannot be accessed if it lived on a now dead machine. These drawbacks come from being build on the legacy Postgres system which contains these same vulnerabilities. The new feature, distribution of processing power, can fail and the system will be able to continue without full user transparency. Specific States Dead: A machine that has died. The system will fail if this is the entry point to the system. If this is a data-storing machine, those data will be inaccessible. If this is a processing machine, the system can continue running as normal until it is recovered. Data Storage Only: This machine will not be assigned queries to process. A machine in this state may be asked to become a Processor by the Load Balancer, at which point it will enter the Becoming a Processor state. Becoming a Processor: This machine is initializing the processes it needs to process queries and connect to clients. Once done it will enter the Processor and not Maxed state, whereby it will begin sending status updates to the Load Balancer. To facilitate this, the Data Synchronization subsystem does a batch-update where it gets all necessary information from a machine already in processor state. Once it has been brought up to date via the batch process, the Data Synchronization will start to use updates found on the bulletin board. Processor: This machine can process queries and its CPU consumption is less than the user configuration defined Maxed variable. Processor at maximum: This machine can process queries, but the Load Balancer will not send it additional client connections until it is no longer Maxed. Processor at critical: This machine can process queries and its CPU consumption is greater than the user configuration defined critical variable. The Load Balancer may move Backend Instances from this machine to another, less busy, machine. Changes to Data Access In its current implementation, Postgres is able to distribute stored data among multiple machines using a common Access Manager. The problem with incorporating this functionality into a new, distributed, system is allowing each system to know where any piece of data is placed. Connecting all Backend Instances through a single Access Manager creates a bottleneck which begins to limit scalability again. PostgreSQL s native data replication functionality could not meet our reliability standards since it cannot place any guarantees that data remain up to date. To address this issue, two alternatives where suggested and researched.

9 Page 8 of 15 First Alternative Forwarding Pointers For the first approach it was decided to avoid any shared memory at all. For this approach, rather than attempt to replicate data over multiple Access Managers and have to deal with replication issues, the data would be split into discrete parts, allocated to separate machines. Each machine would have access only to the data stored within its local resources. This means that any time a machine needs access to non-local data, it needs to request another machine for that data, and therefore needs to maintain a map of which machine to ask. It was suggested that a system of forwarding pointers to update the machines maps was used. Each machine would, in addition to its local data, contain a map of which machines had other data. To illustrate how it would work an example is necessary. When Machine A receives a request for data, it would check its own map to see if it already knew where it was. If it did not, it would query Machine B to see if it had it, or knew where it was. If Machine B did not have it or know where it was, it would query Machine C and so on. In this way, whenever it was not known where data were, or if a machine had an outdated pointer, the next machine would return a new pointer to the location of the data, which it would either know itself, or would receive by asking the next machine. The upside to such a system, as with Laganà s decentralized Inter-Agent message forwarding (Laganà 377), is that there would be no shared memory, and the process would be infinitely scalable. The downside is that if few machines know where the data is, or if the data does not live in the system, the chain of queries could become excessively long. This would cause a worst case scenario where a user would be waiting for an exceptionally long time for information just because the machine that held the data was the last machine to be queried. It was decided that rather than updating on events, constant background updates would be much preferable. Chosen Alternative Bulletin Board (A Repository) Because keeping the data-access catalog in a central place requires too much network activity, a repository which stores only changes to the data-access catalog was suggested. Using this repository, called our Bulletin Board, each machine keeps it s own data-access catalog up to date. Each query to the data-access catalog is local while updates to the catalog, which run constantly in the background, require queries across the network. The Bulletin Board functionality is implemented using two subsystems; Access Manager and Data Synchronization. Each machine creates a data-access catalog as a new catalog table. This catalog is only changed by the Data Synchronization subsystem and is accessed by the Access Manager. The repository is stored as a table on a single machine and is accessed by the Data Synchronization subsystem on every machine. The Data Synchronization subsystem also spawns a process which constantly polls the repository for changes. If it finds an update, it updates the data-access catalog and increments a counter in the repository.

Page 9 of 15 The Access Manager has logic to detect when data may be out of date. If the Access Manager thinks data is incorrect, it can tell the Data Synchronization to check the repository again.

10 Page 9 of 15 The Access Manager has logic to detect when data may be out of date. If the Access Manager thinks data is incorrect, it can tell the Data Synchronization to check the repository again. This is a fail-safe that will hopefully be used minimally since the Data Synchronization subsystem should be keeping data up at all times if possible. The repository is kept small by the fact that once every machine has seen an update, that update is removed from the repository. Any delays in this system are not translated to the user since this is all done by processes running in the background. This same Bulletin Board functionality is used to keep statistics up to date across all machines. A separate catalog is created, and the update cycle for statistics can be slower than for data-positioning updates since they are not critical to have the database functioning properly. Figure 3: How the altered Database Control subsystems works over multiple machines.

11 Page 10 of 15 Software Architecture Analysis Method (SAAM) The Software Architecture Analysis Method considers the non-functional properties of a software system, such as maintainability and reliability, of various architectural choices 3. These properties are considered from the different points of view of those interested in the software system, the stakeholders 3. Stakeholders In order to determine which of the two approaches was ideal, it was necessary to identify the stakeholders of the system and the non-functional requirements that are most important to them. The PostgreSQL Global Development Group. The persons in this group would be interested in having PostgreSQL be a system that is: easily maintainable during development; scalable to handle the varying demands of their clients services; portable across a variety of development or application environments (operating systems) in order to appeal to a wide variety of clients; manageable so that they can have a smooth development process. Companies that use PostgreSQL in their software projects, and stakeholders in those companies. An example of these would be Yahoo.com 2, which makes use of PostgreSQL, but also companies such as doubleclick.com that depend on such a site in order to do advertisement. These companies would want the PostgreSQL system to be: reliable so that they can offer their services for the longest time possible; easily scalable to handle average loads with their desired performance; secure from malicious attacks to maintain the privacy of their clients information; usable so that their web developers can integrate it into their services. Users of PostgreSQL-powered Software such as those using Skype for web conferencing or Yahoo for web searches 2. The users of PostgreSQL-powered software are most concerned with the reliability of the software, the performance in terms of responses to their actions and the security of the database. Users may be concerned with security due to the need for data privacy. For example, Skype can store personal data such as addresses and phone numbers that could be used maliciously in the wrong hands. These users are not directly interested in the technical aspects of the software such as how easy it is to maintain or the usability of the client software-database interface. Evaluation of Approaches Approach #1: Forwarding Pointers With regards to the Forwarding Pointers solution, it was determined that: Performance was to be a major downfall. In the worst case, the computer that is queried will be the farthest one from the computer that actually knows where the data is stored. As such, it is possible that one query could end up running through every machine in the system. As far as reliability goes, the Load Balancer subsystem will prevent machines from overloading; therefore, there should not be a concern in this field. By separating Backend Instances from the Postmaster and distributing them amongst different machines, the system is able to horizontally scale out. This

12 Page 11 of 15 goes a long way towards increasing the scalability of the system. Compared to the Bulletin Board approach, there is no shared memory pool that would need to reference data locations, so this particular issue is not a concern. Given that scalability has been improved upon, the manageability of the system becomes more complex since troubleshooting a larger network of machines is more difficult. The security of the system has not been changed since the number of entry points is the same as before. The main entry point of the machine that was used in the original implementation has not been altered or compromised in any way to support this enhancement approach. Forwarding Pointers is a relatively simple enhancement to implement; as such, it would be an relatively cheap and affordable way to enhance the system. Usability for web developers may be slightly less intuitive than the alternative, since Forwarding Pointers introduces a linear style of referencing that is not entirely elegant. Approach #2: Bulletin Board With regards to the Bulletin Board approach, it was determined that: Performance was significantly faster compared to the Forwarding Pointers approach. Now that there is a central repository for all of the data addresses, the worst case merely involves a computer checking the bulletin board to find a datum that it cannot find within itself. Reliability is no different from the other approaches listed thus far since none of the differences in this approach affect it. Scalability is very slightly more limited compared to the Forwarding Pointers approach. This approach still distributes Backend Instances amongst all machines in the system. What is changed in this particular approach is the addition of the bulletin board. Since this is meant to be a central repository (meaning there is only one across the entire network) it can only be scaled upwards. This should not be a significant concern since data address updates are removed from the bulletin board once they have propagated throughout the system. As with reliability, manageability does not differ in this approach as compared to other approaches. Security also follows the trend of manageability and reliability in that it is no different from other approaches. Affordability is impacted significantly since this enhancement approach calls for the creation of a new component within the Data Synch subsystem. A repository styled component is now being introduced to reference data across a widely scaled out database, which would demand considerable investment to implement. Usability for web developers is better than the alternative since repository architecture style suits this sort of data address referencing better than a series of linear pointers. Chosen Approach Of the two considered approaches, the Bulletin Board enhancement option was chosen. It was decided that although the cost of this implementation would be higher

13 Page 12 of 15 than the alternative, the worst case scenario of the Forwarding Pointers was too costly for it to be a desirable option. Potential Risks There are risks to incorporating our new design into the existing Postgres system. Adding new synchronization aspects to the system risk slowing down certain processes in order to make sure everything is up to date. For instance, the processing of a query may be significantly slower due to the added complexity of accessing data. This is a trade-off between scaling out the ability to concurrently process queries with the average time it takes to process each one. Use Cases Figure 4 shows the first use case of the new enhanced system. It shows the user, talking through the Client Application, establishing a connection with the system. Firstly the Client Application tells the Interface Library that it wants to log on the system. The Interface Library then asks for a Backend Instance to be set up by the Load Balancer. The Load Balancer, using the CPU usage statistics it periodically receives from each machine's Data Synchronization subsystem, will tell the Postmaster on the machine with the lowest CPU usage to create a Backend Instance. Once it is created the connection information of where that Backend Instance is set up is returned to the Client Application where it will then proceed to establish the connection. Now the connection is set up and the Client Application can send a query straight to the Backend Instance. Figure 4: Use case one: Client connecting to a Backend Instance Figure 5 shows the second use case of the system. This use case shows the Client sending a query to its connected Backend Instance and how that machine would be able to find the necessary data it s looking for. Firstly, the Client Application, using libpq, sends the query to the Backend. The Backend Instance will then request the data from the Access Manager on that machine. The Access Manager will check if it knows the location of the data. If it is on the same machine, it will simply return the data. If it is

14 Page 13 of 15 on another machine it will talk to the other machine's Access Manager to receive the data. However, in this case, the Access Manager does not have the location of the data stored on its data-access catalog. It will ask the Data Synchronization subsystem to use its bulletin board capabilities and to send any new updates it has not received. It will update the indices in its data-access catalog and then once again check to see if it knows where the data is. In this case, it now knows that the data rests on machine two. It will talk to machine two's Access Manager and have it return the data values. Finally the Access Manager will return this value to the Backend Instance which will return it to the Client. Figure 5: Client sending a query to the Backend Instance. Testing In order to test the implementation of the proposed enhancements, three methods were selected: regression testing, stress testing, and a test to ensure that any given Access Manager could work in conjunction with the Bulletin Board to find data not housed on its own machine. It is necessary to apply a regression test suite in order to ensure that the enhancement implementation did not break any previous functionality or introduce any integration defects. Stress testing was chosen in order to make sure that the new Load Balancer subsystem is able to handle the high loads of a massively scaled out system by monitoring the CPU usages of each machine. Finally, testing the Access Manager and Bulletin Board pairing ensures that the scaled out data addressing works to specification.

15 Page 14 of 15 Lessons Learned It was learned that there are many different ways to architecturally implement one high level change and that choosing an exact implementation takes a while since so many possibilities and trade offs between performance and ease of implementation need to be considered. It was also learned, through the analysis of the use of forward pointers, that worst case scenarios for implementations must be considered. The biggest lesson learned however was that effective distribution is difficult to build into a legacy system and should typically be planned from the start. Limitations A limitation on creating a new implementation for an already built system is that it is difficult to know if new enhancements will cause integration issues among subsystems. It also had to be assumed that systems could have certain functionality. For example, it had to be assumed that Access Manager subsystems could talk to each other across machines. A limitation to the actual design however, is that objects such as the Bulletin Board and the Load Balancer will still need to scale up as the project scales out because they contain an amount of information proportional to the amount of machines in the system. Conclusions It can be concluded that while high level ideas for implementation can develop quickly, deciding on low level implementation proved to be more difficult and ideas changed rapidly. It was also discovered that the performance of the system proved to be the largest differentiator between the two alternatives of implementation. This improvement overall will be invaluable to PostgreSQL users as the system will now be able to scale to handle extremely large loads. References 1 Laganà, Antonio. Computational science and its applications: ICCSA. Part 3. Berlin: Springer-Verlag, Print. 2 PostgreSQL Development Group. Featured Users [of PostgreSQL]. December 3rd, De Simone, M., Kazman, R. Software Architectural Analysis: An Experience Report. December 3, Cordy, James. Continuous Testing. December 3, PostgreSQL Development Group. PostgreSQL Documentation: cluster. December 3, 2010.

Caché and Data Management in the Financial Services Industry

Caché and Data Management in the Financial Services Industry Executive Overview One way financial services firms can improve their operational efficiency is to revamp their data management infrastructure.