Understanding high availability with WebSphere MQ

Mark Hiscock Software Engineer IBM Hursley Park Lab United Kingdom Simon Gormley Software Engineer IBM Hursley Park Lab United Kingdom May 11, 2005 Copyright International Business Machines Corporation 2005. All rights reserved. This whitepaper explains how you can easily configure and achieve high availability using IBM s enterprise messaging product, WebSphere MQ V5.3 and later. This paper is intended for: o Systems architects who make design and purchase decisions for the IT infrastructure and may need to broaden their designs to incorporate HA. o System administrators who wish to implement and configure HA for their WebSphere MQ environment. Table of Contents 1. Introduction...3 2. High availability...4 3. Implementing high availability with WebSphere MQ...6 3.1. General WebSphere MQ recovery techniques...6 3.2. Standby machine - shared disks...7 3.2.1. HA clustering software...9 3.2.2. When to use standby machine - shared disks...10 3.2.3. When not to use standby machine - shared disks...10 3.2.4. HA clustering active-standby configuration...11 3.2.5. HA clustering active-active configuration...12 3.2.6. HA clustering benefits...13 3.3. z/os high availability options...16 3.3.1. Shared queues (z/os only)...16 3.4. WebSphere MQ queue manager clusters...19 3.4.1. Extending the standby machine - shared disk approach...20 3.4.2. When to use HA WebSphere MQ queue manager clusters...21

3.4.3 When not to use HA WebSphere MQ queue manager clusters...21 3.4.4. Considerations for implementation of HA WebSphere MQ queue manager clusters...22 3.5. HA capable client applications...24 3.5.1. When to use HA capable client applications...25 3.5.2. When not to use HA capable client applications...25 4. Considerations for WebSphere MQ restart performance...26 4.1. Long running transactions...26 4.2. Persistent message use...27 4.3. Automation...27 4.4. File systems...27 5. Comparison of generic versus specific failover technology...29 6. Conclusion...31 Appendix A Available SupportPacs...33 Resources...34 About the authors...34 Page 2

1. Introduction With an ever increasing dependence on IT infrastructure to perform critical business processes, the availability of this infrastructure is becoming more important. The failure of an IT infrastructure results in large financial losses, which increases with the length of the outage [5]. The solution to this problem is careful planning to ensure that the IT system is resilient to any hardware, software, local or system wide failure. This capability is termed resilience computing, which addresses the following topics: o High availability o Fault tolerance o Disaster recovery o Scalability o Reliability o Workload balancing and stress This whitepaper addresses the most fundamental concept of resilience computing, high availability (HA). That is, An application environment is highly available if it possesses the ability to recover automatically within a prescribed minimal outage window [7]. Therefore, an IT infrastructure that recovers from a software or hardware failure, and continues to process existing and new requests, is highly available. Page 3

2. High availability The HA nature of an IT system is its ability to withstand software or hardware failures so that it is available as much of the time as possible. Ideally, despite any failure which may occur, this would be 100% of the time. However, there are factors, both planned and unplanned, which prohibit this from being a reality for most production IT infrastructures. These factors lead to the unavailability of the infrastructure, meaning the ideal availability (per year) can be measured as the percentage of the year for which the system was available. For example: Figure 1. Number 9 s availability per year Availability% Downtime per Year 99 3.65 days 99.9 8.76 hours 99.99 52.6 minutes 99.999 5.26 minutes 99.9999 30.00 seconds Figure 1 shows that a 30 second outage per year is called Six 9 s availability because of the percentage of the year the system was available. Factors that cause a system outage and reduce the number of 9 s up time, fall into two categories: those that are planned and those that are unplanned. Planned disruptions are either systems management (upgrading software or applying patches), or data management (backup, retrieval, or reorganization of data). Conversely, unplanned disruptions are system failures (hardware or software failures) or data failures (data loss or corruption). Maximizing the availability of an IT system is to minimize the impact of these failures on the system. The primary method is the removal of any single point of failure (SPOF) so that should a component fail, a redundant or backup component is ready to take over. Also, to ensure enterprise messaging solutions are made highly available, the software s state and data must be preserved in the event of a failure and made available again as soon as possible. The preservation and restoration of this data removes it as a single point of failure in the system. Some messaging solutions remove single points of failure, and make software state and data available, by using replication technologies. These may be in the form of asynchronous or synchronous replication of data between instances of the software in a network. However, these approaches are not ideal as asynchronous replication can cause duplicated or lost data and synchronous replication incurs a significant Page 4

performance cost as data is being backed up in real time. It is for these reasons that WebSphere MQ does not use replication technologies to achieve high availability. The next section describes methods for making a WebSphere MQ queue manager highly available. Each method describes a technique for HA and when you should and should not consider it as a solution. Page 5

3. Implementing high availability with WebSphere MQ This section discusses the various methods of implementing high availability in WebSphere MQ. Examples show when you can or cannot use HA. Standby machine shared disks and z/os high availability options describe HA techniques for distributed and z/os queue managers, respectively. WebSphere MQ Queue Manager clusters describes a technique available to queue manages on all platforms. HA capable client applications describes a client-side technique applicable on all platforms. By reading each section, you can select the best HA methodology for your scenario. This paper uses the following terminology: Machine A computer running an operating system. Queue manager A WebSphere MQ queue manager that contains queue and log data. Server A machine that runs a queue manager and other 3 rd party services. Private message queues These are queues owned by a particular queue manager and are only accessible, via WebSphere MQ applications, when the owning Queue manager is running. These queues are to be contrasted with shared messages queues (explained below), which are a particular type of queue only available on z/os. Shared message queues These are queues that reside in a Coupling Facility and are accessible by a number of queue managers that are part of a Queue Sharing Group. These are only available on z/os and are discussed later. 3.1. General WebSphere MQ recovery techniques On all platforms, WebSphere MQ uses the same general techniques for dealing with recovery of private message queues after a failure of a queue manager. With the exception of shared messages queues (see Shared queues ), messages are cached in memory and backed by disk storage if the volume of message data exceed the available memory cache. When persistent messaging is used, WebSphere MQ logs messages to disk storage. Therefore, in the event of a failure, the combination of the message data on disk plus the queue manager logs can be used to reconstruct the message queues. This restores the queue manager to a consistent state at the time just before the failure occurred. This recovery involves completing normal Unit or Work resolution, with in-flight messages being rolled back, in-commit messages being complete, and in-doubt messages waiting for coordinator resolution. The following sections describe how the above general restart process is used in conjunction with platform specific facilities, such as HACMP on AIX or ARM on z/os, to quickly restore message availability after failures. Page 6

WebSphere MQ also provides a mechanism for improving the availability of new messages by routing messages around a failed queue manager transparently to the application producing the messages. This is called Websphere MQ clustering and is covered in WebSphere MQ Queue Manager clusters. Finally on z/os, WebSphere MQ supports shared message queues that are accessible to a number of queue managers. Failure of one queue manager still allows the messages to be accessed by other queue managers. These are covered in z/os high availability options. 3.2. Standby machine - shared disks As described above, when a queue manager fails, a restart is required to make the private message queues available again. Until then, the messages stored on the queue manager will be stranded. Therefore, you cannot access them until the machine and queue manager are returned to normal operation. To avoid the stranded messages problem, stored messages need to be made accessible, even if the hosting queue manager or machine is inoperable. In the standby machine solution, a second machine is used to host a second queue manager that is activated when the original machine or queue manager fails. The standby machine needs to be an exact replica, at any given point in time of the master machine, so that when failure occurs, the standby machine can start the queue manager correctly. That is, the WebSphere MQ code on the standby machine should be at the same level, and the standby machine should have the same security privileges as the primary machine. A common method for implementing the standby machine approach is to store the queue manager data files and logs on an external disk system that is accessible to both the master and standby machines. WebSphere MQ writes its data synchronously to disk, which means a shared disk will always contain the most recent data for the queue manager. Therefore, if the primary machine fails, the secondary machine can start the queue manager and resume its last known good state. Page 7

Figure 2. An active-standby setup The standby machine is ready to read the queue manager data and logs from the shared disk and to assume the IP address of the primary machine [3]. A shared external disk device is used to provide a resilient store for queue data and queue manager logs so that replication of messages are avoided. This preserves the once and once only delivery characteristic of persistent messages. If the data was replicated to a different system, the messages stored on the queues have been duplicated to the other system, and once and once only delivery cannot be guaranteed. For instance, if data was replicated to a standby server, and the connection between the two servers fails, the standby assumes that the master has failed, takes over the master server s role, and starts processing messages. However, as the master is still operational, messages are processed twice, hence duplicated messages occur. This is avoided when using a shared hard disk because the data only exists in one physical location and concurrent access is not allowed. The external disk used to store queue manager data should also be RAID 1 enabled to prevent it being a single point of failure (SPOF) [8]. The disk device may also have multiple disk controllers and multiple physical connections to each of the machines, to provide redundant access channels to the data. In normal operation, the shared disk is mounted by the master machine, which uses the storage to run the queue manager in the same way as if it were a local disk, storing both the queues and the WebSphere 1 Using a RAID configuration protects against data loss, such as mirroring. Page 8

MQ log files on it. The standby machine cannot mount the shared disk and therefore, cannot start the queue manager because the queue manager data is not accessible. When a failure is detected, the standby machine automatically takes on the master machine s role, and as part of that process, mounts the shared disk and starts the queue manager. The standby queue manager replays the logs stored on the shared disk to return the queue manager to the correct state, and resumes normal operations. Note that messages on queues that are failed over to another queue manager retain their order on the queue. This failover operation can also be performed without the intervention of a server administrator. It does require external software, known as HA clustering, to detect the failure and initiate the failover process. Only one machine has access to the shared 2 disk partition at a time, and only one instance of the queue manager runs at any one time to protect data integrity of messages. The objective of the shared disk is to move the storage of important data (for example, queue data and queue manager logs) to a location external to the machine, so that when the master machine fails, another machine may use the data. 3.2.1. HA clustering software Much of the functionality in the standby machine configuration is provided by external software, often termed as HA clustering software [4]. This software addresses high availability issues using a more holistic approach than single applications, such as WebSphere MQ, can provide. It also recognizes that a business application may consist of many software packages and other resources, all of which need to be highly available. This is because another complication is introduced when a solution consists of several applications that have a dependency on each other. For example, an application may need access to both WebSphere MQ and a database, and may need to run on the same physical machine as these services. HA clustering provides the concept of resource groups, where applications are grouped together. When failure occurs in of one of the applications in the group, the entire group is moved to a standby server, satisfying the dependency of the applications. However, this only occurs if the HA clustering software fails to restart the application on its current machine. It is also possible to move the network address and any other operating system resources with the group so that the failover is transparent to the client. If an individual software package was responsible for its own availability, it may not be able to transfer to another physical machine and will not be able to move any other resources on which it is dependent. By using HA clustering to cope with these low level considerations, such as network address takeover, disk access, and application dependencies, the higher level applications are relieved of this complexity. Although there are several vendors providing HA clustering, each package tends to follow the same basic principles and provide a similar set of basic functionality. Some solutions, such as Veritas Cluster Server and SteelEye LifeKeeper, are also compatible with multiple platforms to provide a similar solution in heterogeneous environments. In the same way that WebSphere MQ removed the complexity of application connectivity from the programmer, HA clustering techniques help provide a simple, 2 A more accurate name would be switchable disks. Page 9

generic solution for HA. This means applications, such as messaging and data management, can focus on their core competencies leaving HA clustering to provide a more reliable availability solution than resource-specific monitors. HA clustering also covers both hardware and software resources, and is a proven, recognized technology used in many other HA situations. HA clustering products are designed to be scalable and extensible to cope with changing requirements. IBM s AIX HACMP product, SteelEye LifeKeeper, and Veritas Cluster Server scale up to 32 servers. HACMP, LifeKeeper, and Cluster Server have extensions available to allow replication of disks to a remote site for disaster recovery purposes. 3.2.2. When to use standby machine - shared disks The standby machine solution is ideal for messages that are delivered once and only once. For example, in billing and ordering systems, it is essential that messages are not duplicated so that customers are not billed twice, or sent two shipments instead of one. As HA clustering software is a separate product that sits along side existing applications, this methodology is also suited to convert an existing server, or set of servers to be highly available. It is possible to gradually convert a set of servers to be highly available. In large installations where there are many servers, HA clustering is a cost effective choice through the use of an n+1 configuration. In this approach, a single machine is used as a backup for a number of live servers. Hardware redundancy is reduced and therefore, cost is reduced, as only one extra machine is required to provide high availability to a number of active servers. As already shown, HA clustering software is capable of converting an existing application and its dependent resources to be highly available. It is, therefore, suited to situations where there are several applications or services that need to be made highly available. If those applications are dependent on each other, and rely on operating system resources, such as network addresses to function correctly, HA clustering is ideally suited. 3.2.3. When not to use standby machine - shared disks HA clustering is not always necessary when considering an HA solution. Although the examples given below are served by an HA clustering method, other solutions would serve just as well and it would be possible to utilize HA clustering at a later date if required. If the trapped messages problem is not applicable, such as there is no need to restart a failed queue manager with its messages intact, then shared disks are not necessary. This occurs if the system is only used for event messages that will be re-transmitted regularly, for messages that expire in a relatively short time, or for non-persistent messages (where an application is not relying on WebSphere MQ for assured delivery). For these situations, you can make a system highly available by using WebSphere MQ queue manager clustering only. This technology load balances messages and routes around failed servers. See WebSphere MQ Queue Manager clusters for more information on queue manager clusters. Page 10

In situations where it is not important to process the messages as soon as possible, then HA clustering may provide too much availability at too much of an expense. For example, if trapped messages can wait until an administrator restarts the machine, and hence the queue manager is restarted (using an internal RAID disk to protect the queue manager data), then HA clustering is considered too comprehensive of a solution. In this situation, it is possible to allow access for new messages using WebSphere MQ queue manager clustering, as in the case above. The shared disk solution requires the machines to be physically close to each other, as the distance from the shared disk device needs to be small. This makes it unsuitable for use in a disaster recovery solution. However, some HA clustering software can provide disaster recovery functionality. For example, IBM s HACMP package has an extension called HAGEO, which provides data replication to remote sites. By backing up data in this fashion, it is possible to retrieve it if a site wide failure occurs. However, the off-site data may not be the most up-to-date because the replication is often delayed by a few minutes. This is because instantaneous replication of data to an off-site location incurs a significant performance hit. Therefore, the more important the data, the smaller the time interval will be, but the greater the performance impact. Time and performance must be traded against each other when implementing a disaster recovery solution. Such solutions do not provide all of the benefits of the shared disk solution and are beyond the scope of this document. The following sections describe two possible configurations for HA clustering. These are termed active-active and active-standby configurations. 3.2.4. HA clustering active-standby configuration In a generic HA clustering solution, when two machines are used in an active standby configuration, one machine is running the applications in a resource group and the other is idle. In addition to network connections to the LAN, the machines also have a private connection to each other. This is either in the form of a serial link or a private Ethernet link. The private link provides a redundant connection between the machines for the purpose of detecting a complete failure. As previously mentioned, if a link between the machines fails, then both machines may try to become active. Therefore, the redundant link reduces the risk of communication failure between the two. The machines may also have two external links to the LAN. Again, this reduces the risk of external connectivity failure, but also allows the machines to have their own network address. One of the adapters is used for the service network address, such as the network address that clients use to connect to the service, and the other adapter has a network address associated with the physical machine. The service address is moved between the machines upon failure to provide HA transparency to any clients. The standby machine monitors the master machine via the use of heartbeats. These are periodic checks by the standby machine to ensure that the master machine is still responding to requests. The master machine also monitors its disks and the processes running on it to ensure that no hardware failure has occurred. For each service running on the machine, a custom utility is required to inform the HA clustering software that it is still running. In the case of WebSphere MQ, the SupportPacs describing HA configurations provide utilities to check the operation of queue Page 11

managers, which can easily be adapted for other HA systems. Details of these SupportPacs are listed in Appendix A. A small amount of configuration is required for each resource group to describe what should happen at start-up and shutdown, although in most cases this is simple. In the case of WebSphere MQ, this could be a start up script containing commands to start the queue manager (for example, strmqm), listener (for example, runmqlsr), or any other queue manager programs. A corresponding shutdown script is also needed, and depending on the HA clustering package in use, a number of other scripts may be required. Samples for WebSphere MQ are provided with the SupportPacs described in Appendix A. As the heartbeat mechanism is the primary method of failure detection, if a heartbeat does not receive a response, the standby machine assumes that the master server has failed. However, heartbeats may not respond because of a number of reasons, such as an overloaded server, or communication failure. There is a possibility that the master server will resume processing at a later stage, or is still running. This can lead to duplicate messages in the system and is not desired. Managing this problem is also the role of the HA clustering package. For example, RedHat Cluster services and IBM s HACMP work around this problem by having a watchdog timer with a lower timeout than the cluster. This ensures that the machine reboots itself before another machine in the cluster takes over its role. Programmable power supplies are also supported, so other machines in the cluster can power cycle the affected machine, to ensure that it is no longer operational before starting the resource group. Essentially, the machines in the cluster have the capability to turn the other machines off. Some HA clustering software suites also provide the capability to detect other types of failure, such as system resource exhaustion, or process failure, and try to recover from these failures locally. For WebSphere MQ, you can implement on AIX using the appropriate SupportPac (see Appendix A) to restart a queue manager locally, which is not responding. This can avoid the more time consuming operation of completely moving the resource group to another server. You should design the machines used in HA clustering to have identical configurations to each other. This includes installed software levels, security configurations, and performance capabilities, to minimize the possibility of resource group start-up failure. This ensures that machines in the network all have the capability to take on another machine s role. Note that for active-standby configurations, only one instance of an application is running at any one moment and therefore, software vendors may only charge for one instance of the application, as is the case for WebSphere MQ. 3.2.5. HA clustering active-active configuration It is also possible to run services on the redundant machine in what is termed an active active configuration. In this mode, the servers are both actively running programs and acting as backups for each other. If one server fails, the other continues Page 12

to run its own services, as well as the failed server s. This enables the backup server to be used more effectively, although when a failure does occur, the performance of the systems is reduced because it has taken on extra processing. In Figure 3, the second active machine runs both queue managers if a failure occurs. Figure 3. An active-active configuration In larger installations, where several resource groups exist and more than one server needs to be made highly available, it is possible to use one backup machine to cover several active servers. This setup is known as an n+1 configuration, and has the benefit of reduced redundant hardware costs, because the servers do not have a dedicated backup machine each. However, if several servers fail at the same time, the backup machine may become overloaded. These extra costs must be weighed up against the potential cost of more than one server failing, and more than one backup machine being required. 3.2.6. HA clustering benefits HA clustering software provides the capability to perform controlled failover of resource groups. This allows administrators to test the functionality of a configured system, and also allow machines to be gracefully removed from an active cluster. This can be for maintenance purposes, such as hardware and software upgrades or data backup. It also allows failed servers, once repaired, to be placed back in the cluster and to resume their services. This is known as fail-back [4]. A controlled failover operation also results in less downtime because the cluster does not need to detect the Page 13

failure. There is no need to wait for the cluster timeout. Also, as the applications, such as WebSphere MQ, are stopped in a controlled manner, the start up time is reduced because there is no need for log replay. Using the abstract resource groups makes it possible for a service to remain highly available. This occurs when the machine that is normally running the services has been removed from the cluster. This is only true as long as the other machines have comparable software installed and access to the same data, meaning any machine can run the resource group. The modular nature of resource groups also helps the gradual uptake of HA clustering in an existing system and easily allows services to be added at a later date. This also means that in a large queue manager installation, you can convert mission critical queue managers to be highly available first, and later convert the less critical queue managers, or not at all. Many of the requirements for implementing HA clustering are also desirable in more bespoke, or product-centric HA solutions. For example, RAID disk arrays [8], extra network connections and redundant power supplies all protect against hardware failure. Therefore, improving the availability of a server results in additional cost, whether a bespoke or HA clustering technique is used. HA clustering may require additional hardware over and above some application specific HA solutions, but this enables a HA clustering approach to provide a more complete HA solution. You can easily extend the configuration of HA clustering to cover other applications running on the machine. The availability of all services is provided via a standard methodology and presented through a consistent interface rather than being implemented separately by each service on the machine. This in turn reduces complexity and staff training times and reduces errors being introduced during administration activities. By using one product to provide an availability solution, you can take a common approach to decision making. For instance, if a number of the servers in a cluster are separated from the others by network failure, an unanimous decision is needed to decide which servers should remain active in the cluster. If there were several HA solutions in place (such as each product uses its own availability solution), each with separate quorum algorithms 3, then it is possible that each algorithm has a different outcome. This could result in an invalid selection of active servers in the cluster that may not be able to communicate. By having a separate entity, in the form of the HA clustering software, to decide which part of the cluster has the quorum, only one outcome is possible, and the cluster of servers continues to be available. Summary The shared disk solution described above is a robust approach to the problem of trapped messages, and allows access to stored messages in the event of a failure. However, there will be a short period of time where there is no access to the queue manager while the failure is being detected, and the service is being transferred to the standby server. It is possible during this time to use WebSphere MQ clustering to provide access for new messages because its load balancing capabilities will route 3 A quorum is the minimum number of members of a deliberative body necessary to conduct the business of that group. Page 14

messages around the failed queue manager to another queue manager in the cluster. How to use HA clustering with WebSphere MQ clustering is described in When to use WebSphere MQ queue manager clusters. Page 15

3.3. z/os high availability options z/os provides a facility for operating system restart of failed queue managers called Automatic Restart Manager (ARM). It provides a mechanism, via ARM policies, for a failed queue manager to be restarted in place on the failing logical partition (LPAR). Or, in the case of an LPAR failure, started on a different LPAR along with other subsystems and applications grouped together, such that the subsystem components provide the overall business solution can be restarted together. In addition, with a parallel sysplex, Geographically Dispersed Parallel Sysplex (GDPS) provides the ability for automatic restart of subsystems, via remote DASD copying techniques, in the event of a site failure. The above techniques are restart techniques that are similar to those discussed earlier for distributed platforms. We will now look at a capability which maximizes the availability of message queues in the event of queue manager failures that does not require queue manager restart. 3.3.1. Shared queues (z/os only) WebSphere MQ shared queues is an exploitation of the z/os-unique Coupling Facility (CF) technology that provides high-speed access to data across a sysplex via a rich set of facilities to store and retrieve data. WebSphere MQ stores shared message queues in the Coupling Facility, and this in turn, means that unlike private message queues, they are not owned by any single queue manager. Queue managers are grouped into Queue Sharing Groups (QSGs), analogous to Data Sharing Groups with data-sharing DB2. All queue managers within a QSG can access shared message queues for putting and getting of messages via the WebSphere MQ API. This enables multiple putters and getters on the same shared queue from within the QSG. Also WebSphere MQ provides peer recovery such that inflight shared queue messages are automatically rolled back by another member of the QSG in the event of a queue manager failure. WebSphere MQ still uses its logs for capturing persistent message updates so that in the extremely unlikely event of a CF failure, you can use the normal restart procedures to restore messages. In addition, z/os provides system facilities to automatically duplex the CF structures used by WebSphere MQ. The combination of these facilities provides WebSphere MQ shared message queues with extremely high availability characteristics. Figure 4 shows three queue managers: QM1, QM2 and QM3 in the QSG GRP1 sharing access to queue A in the coupling facility. This setup allows all three queue managers to process messages arriving on queue A. Page 16

Figure 4. Three queue managers in a QSG share queue A on a Coupling Facility GRP1 QM 2 QM 1 QM 3 Q A Coupling Facility A further benefit of using shared queues is utilizing shared channels. You can use shared channels in two different scenarios to further extend the high availability of WebSphere MQ. First, using shared channels, an external queue manager can connect to a specific queue manager in the QSG using channels. It can then put messages to the shared queue via this queue manager. This allows for queue managers in a distributed environment to utilize the HA functionality provided by shared queues. Therefore, the target application of messages put by the queue manager can be any of those running on a queue manager in the QSG. Second, you can use a generic port so that a channel connecting to the QSG could be connected to any queue manager in the QSG. If the channel loses its connection (because of a queue manager failure), then it is possible for the channel to connect to another queue manager in the QSG by simply reconnecting to the same generic port. 3.3.1.1 Benefits of shared message queues The main benefit of a shared queue is its high availability. There are numerous customer selectable configuration options for CF storage, ranging from running on standalone processors with their own power supplies to the Internal Coupling Facility (ICF) that runs on spare processors within a general zseries server. Another key factor is that the Coupling Facility Control Code (CFCC) runs in its own LPAR, where it is isolated from any application or subsystem code. In addition, it naturally balances the workload between the queue managers in the QSG. That is, a queue manager will only request a message from the shared queue when the application, which is processing messages, is free to do so. Therefore, the availability of the messaging service is improved because queue managers are not flooded by messages directly. Instead, they consume messages from the shared queue when they are ready to do so. Also, should greater message processing performance be required, you can add extra queue managers to the QSG to process more incoming messages. With persistent messages, both private and shared, the message processing limit is constrained by the speed of the log. With shared message queues, each queue manager uses its own log Page 17

for updates. Therefore, deploying additional queue managers to process a shared queue means the total logging cost is liquidated gradually over a number of queue managers. This provides a highly scalable solution. Conversely, if a queue manager requires maintenance, you can remove it from the QSG, leaving the remaining queue managers to continue processing the messages. Both the addition and removal of queue managers in a QSG can be performed without disrupting the already existing members. Lastly, should a queue manager fail during the processing of a Unit of Work, the other members of the QSG will spot this and Peer Recovery is initiated. That is, if the unit of work was not completed by the failed queue manager, another queue manager in the QSG will complete the processing. This arbitration of queue manager data is achieved via hardware and microcode on z/os. This means that the availability of the system is increased as the failure of any one queue manager does not result in trapped messages or inconsistent transactions. This is because Peer Recovery either completes the transaction or rolls it back. For more information on Peer Recovery and how to configure it, see z/os Systems Administration Guide [6]. The benefits of shared queues are not solely limited to z/os queue managers. Although you cannot setup shared queues in a distributed environment, it is possible for distributed queue managers to place messages onto them through a member of the QSG. This allows for the QSG to process a distributed application s message in a z/os HA environment. 3.3.1.2. Limitations of shared message queues With WebSphere MQ V5.3, physical shared messages are limited to be less than 63KB in size. Any application that attempts to put a message greater than this limit receives an error on the MQPUT call. However, you can use the message grouping API to construct a logical message greater than 63KB, which consists of a number of physical segments. The Coupling Facility is a resilient and durable piece of hardware, but it is a single point of failure in this high availability configuration. However, z/os provides duplexing facilities, where updates to one CF structure are automatically propagated to a second CF. In the unlikely event of failure of the primary CF, z/os automatically switches access to the secondary, while the primary is being rebuilt. This system-managed duplexing is supported by WebSphere MQ. While the rebuild is taking place, there is no noticeable application effect. However, this duplexing will clearly have an effect on overall performance. Finally, a queue manager can only belong to one QSG and all queue managers in a QSG must be in the same sysplex. This is a small limitation on the flexibility of QSGs. Also a QSG can only contain a maximum of 32 queue managers. For more information on shared queues, see WebSphere MQ for Z/OS Concepts and Planning Guide [1]. Page 18

3.4. WebSphere MQ queue manager clusters A WebSphere MQ queue manager cluster is a cross platform workload balancing solution that allows WebSphere MQ messages to be routed around a failed queue manager. It allows a queue to be hosted across multiple queue managers, thus allowing an application to be duplicated across multiple machines. It provides a highly available messaging service allowing incoming messages to be forwarded to any queue manager in the cluster for application processing. Therefore, if any queue manager in the cluster fails, new incoming messages continue to be processed by the remaining queue managers. In Figure 5, an application puts a message to a cluster queue on QM2. This cluster queue is defined locally on QM1, QM4 and QM5. Therefore, one of these queue managers will receive the message and process it. Figure 5. Queue managers 1,4, and 5 in the cluster receive messages in order cluster Queue Local Queue Application QM 1 QM 2 QM 3 QM 6 QM 4 cluster 1 QM 5 By balancing the workload between QM1, QM4, and QM5, an application is distributed across multiple queue managers making it highly available. If a queue manager fails, the incoming messages are balanced among the remaining queue managers. While WebSphere MQ clustering provides continuous messaging for new messages, it is not a complete HA solution because it is unable to handle messages that have already been delivered to a queue manager for processing. As we have seen above, if a queue manager fails, these trapped private messages are only processed when the queue manager is restarted. However, by combining WebSphere MQ clustering with the recovery techniques covered above, you can create an HA solution from both new and existing messages. The following section shows this in action in a distributed shared disk environment. Page 19

3.4.1. Extending the standby machine - shared disk approach By hosting cluster queue managers on active-standby or active-active setups, trapped messages, on private or cluster queues, are made available when the queue manager is failed over to a standby machine and restarted. The queue manager will be failed over and will begin processing messages within minutes instead of the longer amount of time it would take to manually recover and repair the failed machine or failed queue manager in the cluster. The added benefit of combining queue manager clusters with HA clustering is that the high availability nature of the system becomes transparent to any clients using it. This is because they are putting messages to a single cluster queue. If a queue manager in the cluster fails, the client s outstanding requests are processed when the queue manager is failed over to a backup machine. In the meantime, the client needs to take no action because its new requests will be routed around the failure and processed by another queue manager in the cluster. The client must only be tolerant if its requests are taking slightly longer than normal to be returned in the event of a failover. Figure 6 shows each queue manager in the cluster in an active-active, standby machine-shared disk configuration. The machines are configured with separate shared disks for queue manager data and logs to decrease the time required to restart the queue manager. See Considerations for WebSphere MQ restart performance for more information. Figure 6. Queue managers 1,4, and 5 have active standby machines Cluster Queue Local Queue Application QM 1 QM 2 QM 3 QM QM log log QM 6 QM log QM 4 cluster 1 QM 5 In this example, if queue manager 4 fails, it fails over to the same machine as queue manager 3, where both queue managers will run until the failed machine is repaired. Page 20

3.4.2. When to use HA WebSphere MQ queue manager clusters Because this solution is implemented by combining external HA clustering technology with WebSphere MQ queue manager clusters, it provides the ultimate high availability configuration for distributed WebSphere MQ. It makes both incoming and queued messages available and also fails over not only a queue manager, but also any other resources running on the machine. For instance, server applications, databases, or user data can fail over to a standby machine along with the queue manager. When using HA WebSphere MQ clustering in an active-standby configuration, it is a simpler task to apply maintenance or software updates to machines, queue managers, or applications. This is because you can first update a standby machine, then a queue manager can fail over to it, ensuring that the update works correctly. If it is successful, you can update the primary machine and then the queue manager can fail back onto it. HA WebSphere MQ queue manager clusters also greatly reduce the administration of the queue managers within it, which in turn reduces the risk of administration errors. Queue managers that are defined in a cluster do not require channel or queue definitions setup for every other member of the cluster. Instead, the cluster handles these communications and propagates relevant information to each member of the cluster through a repository. HA WebSphere MQ queue manager clusters are able to scale applications linearly because you can add new queue managers to the cluster to aid in the processing of incoming messages. Conversely, you can remove queue managers from the cluster for maintenance and the cluster can still continue to process incoming requests. If the queue manager s presence in the cluster is required, but the hardware must be maintained, then you can use this technique in conjunction with failing the queue manager over to a standby machine. This frees the machine, but keeps the queue manager running. It is also possible for administrators to write their own cluster workload exits. This allows for a finer control of how messages are delivered to queue managers in the cluster. Therefore, you can target messages at machines in different ratios based on the performance capabilities of the machine (rather than in a simple round robin fashion). 3.4.3 When not to use HA WebSphere MQ queue manager clusters HA WebSphere MQ queue manager clusters require additional proprietary HA hardware (shared disks) and external HA clustering software (such as HACMP). This increases the administration costs of the environment because you also need to administer the HA components. This approach also increases the initial implementation costs because extra hardware and software are required. Therefore, balance these initial costs with the potential costs incurred if a queue manager fail and messages become trapped. Note that non-persistent messages do not survive a queue manager failover. This is because the queue manager restarts once it has been failed over to the standby machine, causing it to process its logs and return to its most recent known state. At Page 21

this point, non persistent messages are discarded. Therefore, if your application requires non-persistent messages, take into account this factor. If trapped messages are not a problem for the applications (for example, the response time of the application is irrelevant or the data is updated frequently), then HA WebSphere MQ queue manager clusters are probably not required. That is, if the amount of time required to repair a machine and restart its queue manager is acceptable, then having a standby machine to take over the queue manager is not necessary. In this case, it is possible to implement WebSphere MQ queue manager clusters without any additional HA hardware or software. 3.4.4. Considerations for implementation of HA WebSphere MQ queue manager clusters When configuring an active-active or active-standby setup in a cluster, administrators should test to ensure that the failover of a given node works correctly. Nodes should be failed over, when and where possible, to backup machines to ensure the failover processes work as designed and that no problems are encountered when a failover is actually required. Perform this procedure at the discretion of the administrators. It may cause problems or outages in a future production environment if failover does not happen smoothly. As with queue manager clusters, do not code WebSphere MQ applications as machine or queue manager specific, such as relying on resources only available to a single machine. This is because when applications are failed over to a standby machine, along with the queue manager they are running on, they may not have access to these resources. To avoid these administrative problems, machines should be as equal as possible with respect to software levels, operating system environments, and security settings. Therefore, any failed over applications should have no problems running. Avoid message affinities when programming applications. This is because there is no guarantee that messages put to the cluster queue will be processed by the same queue manager every time. It is possible to use the MQ Open Option BIND_ON_OPEN to ensure an application s messages are always delivered to the same queue manager in the cluster. However, an application performing this operation incurs reduced availability because this queue manager may fail during message processing. In this case, the application must wait until the queue manager is failed over to a backup machine before it can begin processing the applications requests. If affinities had not been used, then no delay in message processing would be experienced. Another queue manager in the cluster would continue processing any new requests. Application programmers should avoid long running transactions in their applications. This is because these will greatly increase the restart time of the queue manager when it is failed over to a standby machine. See Considerations for WebSphere MQ restart times for more information. When implementing a WebSphere MQ cluster solution, whether for an HA configuration or for normal workload balancing, be careful to have at least two full cluster repositories defined. These repositories should be on machines that are highly Page 22

available. For example, they have redundant power supplies, network access and hard disks, and are not heavily loaded with work. Repositories are vital to the cluster because they contain cluster wide information that is distributed to each cluster member. If both of these repositories are lost, it is impossible for the cluster to propagate any cluster changes, such as new queues or queue managers. However, the cluster continues to function with each member s partial repositories until the full repositories are restored. Page 23

3.5. HA capable client applications You can achieve high availability on the client side rather than using HA clustering, HA WebSphere MQ queue manager clusters, or shared queue server side techniques as previously described. HA capable clients are an inexpensive way to implement high availability, but usually it results in a large client with complex logic. This is not ideal and a server side approach is recommended. However, HA capable clients are discussed here for completeness. Most occurrences of a queue manager failure result in a connection failure with the client. Even if the queue manager is returned to normal operation, the client disconnects and remains so until the code used to connect the client to the queue manager is executed again. One possible solution to the problem of a server failure is to design the client applications to reconnect, or connect to a different, but functionally identical server. The client s application logic has to detect a failed connection and reconnect to another specified server. The method of detecting and handling a failed connection depends on the MQ API in use. MQ JMS, for instance, provides an exception listener mechanism that allows the programmer to specify code to be run upon a failure event. The programmer can also use Java try catch blocks to allow failures to be handled during code execution. The MQI API reports a failure upon the next function call that requires communication with the queue manager. In this scenario, it is the programmer s responsibility to resolve the failure. The management of the failure depends on the type of application and also, if there are any other high availability solutions in place. A simple reconnect to the same queue manager may be attempted, and if successful, the application can resume processing. You can configure the application with a list of queue managers that it may connect to. Upon failure, it can reconnect to the next queue manager in the list. In an HA clustering solution, clients still experience a failed connection if a server is failed-over to a different physical machine. This is because it is not possible to move open network connections between servers. The client also may need to be configured to perform several reconnect attempts to the server, and/or wait a period of time to allow time for the server to restart. If the application is transactional, and the connection fails mid-transaction, the entire transaction needs to be re-executed when a new connection is established. This is because WebSphere MQ queue managers will rollback any uncommitted work at start-up time. You can supplement many server-side HA solutions with the use of client side application code designed to cope with the temporary loss, or need to reconnect to a queue manager. A client that contains no extra code may need user intervention, or even need to be completely restarted to resume full functionality. There is obviously extra effort required to code the client application to be HA aware, but the end result is a more autonomous client. Page 24