Load Balancer Considerations for Cisco Information Server

Tech Note Load Balancer Considerations for Cisco Information Server Data Virtualization Business Unit Advanced Services June 2015

TABLE OF CONTENTS INTRODUCTION... 4 Purpose... 4 Audience... 4 LOAD BALANCER SELECTION CONSIDERATIONS... 5 Load Balancer Requirements... 5 Protocols and Ports... 5 Load Balancer Routing Assignments... 6 Health Probe Implementation Options... 6 Basic Complexity... 7 Intermediate Complexity... 7 High Complexity... 8 Health Probe Retries and Status Latency... 9 Health Probe Retries and Recovery... 10 Probing Additional Ports... 10 Scheduled Maintenance for CIS... 10 FREQUENTLY ASKED QUESTIONS... 12 Should CIS Studio Users Connect through the Load Balancer?... 12 Should CIS Manager Users Connect through the Load Balancer?... 12 Are there any Concerns with Setting up a Site Connection in Business Directory?... 13 Are there benefits to using a load balancer for a single CIS server?... 12 Are there Extra Steps for Configuring SOAP Web Services... 13 Are there Extra Steps for Configuring REST Web Services... 13

DOCUMENT CONTROL Version History Version Date Author Description 1.0 April 2015 Mike Shen Initial version 2.0 June 2015 Mike Shen Matthew Lee Updated for new documentation standard and added additional content Related Documents Document Date Author CIS Administration Guide CIS Active Cluster Administration Guide DVBU Engineering DVBU Engineering Data Virtualization Business Unit (DVBU) Products Referenced DVBU Product Name Version Cisco Information Server (CIS) All CIS Active Cluster All CIS Business Directory (BD) 7.0

INTRODUCTION Purpose Cisco Information Server (CIS) installations intended to be join together using CIS active cluster require setup of an external load balancer to distribute work load across the clustered CIS instances. This document provides guidance on how to configure load balancers to work with CIS clusters. Audience This document is intended to provide guidance for the following audiences: System Architects Load Balancer Administrators CIS Administrators

LOAD BALANCER SELECTION CONSIDERATIONS Load Balancer Requirements Any load balancer that redirects Layer 4-7 traffic should work with CIS. Examples of load balancers that customers have used include F5 Big-IP, Cisco ACE, and Barracuda. Protocols and Ports The ports used by CIS and Business Directory are listed in the table below. It indicates whether each port needs to be configured (forwarded) in the load balancer. Note that it is just the client facing ports that need to be configured. Port Protocol / Purpose Forwarded by load balancer? 9400 HTTP Yes No 1* 9401 JDBC, ODBC, ADO.NET Yes No 9402 HTTPS Yes No 1* 9403 Encrypted traffic over JDBC, ODBC, ADO.NET Yes 9404 [Reserved] No 9405 [Reserved] No 9406 Monitor Daemon No 9407 Active Cluster internal heartbeat traffic 9408 Connect to metadata repository 9409 Monitor Server data collection No No No Session Affinity (Stickiness) No Business Directory ports below 9500 BD HTTP Yes No 2* 9502 BD HTTPS Yes No 2* 1* : Please see the FAQ section of this document for details about concerns related to using CIS Studio and Manager in conjunction with a load balancer. 2* : Please see the FAQ section of this document for details about using a load balancer with CIS Business Directory.

Load Balancer Routing Assignments The load balancer must actively maintain a pool of available CIS nodes (servers). This pool of CIS nodes sometimes referred to as a server-farm. Incoming requests for CIS must be routed by the load balancer to one of the nodes in the server-farm through a load-balancing algorithm. The most common algorithms are Round Robin and Least Traffic. It is possible to implement more sophisticated routing algorithms, if desired. It is highly recommended that the load balancer make use of a health probe mechanism to support the routing decision. The health probe allows the load balancer to periodically determine if a node is healthy or sick. The load balancer should be configured to only route requests to the healthy nodes. The following diagram provides a visual context. Figure 1. Load balancer and health probe working together. Note that CIS Node 3 is not in service due to failed health probe(s). Health Probe Implementation Options When implementing a health probe mechanism, load balancer and CIS administrators should define the criteria necessary to determine if a CIS node should be considered healthy for their implementation. The following section outlines general implementation options and their respective advantages.

It is common to implement a simple web service operation to support health probing the CIS server. This web service should not execute complicated or expensive operations. This is to avoid putting excess load on the CIS server and to minimize the response time of the operation. The REST web service interface is the most common mechanism to support health probes. Whether the web service returns XML or JSON messages is determined by the load balancer team's preference. If the health probe URL needs to be accessible anonymously, then a CIS administrator will need to enable the built-in Anonymous user and enable anonymous access for the health probe web service. Please refer to the CIS Administration Guide for instructions on enabling and configuring the anonymous user. Warning: If anonymous access to this health probe is required, significant care should be taken to ensure that the web service does not accidentally return sensitive data. When executing the health probe services on each CIS node, the load balancer will evaluate the response code. It may or may not evaluate the response message. Also, it may or may not evaluate the response time. These are optional configurations. If the response time is considered, then a threshold value needs to be defined. A threshold of 15 seconds is a good choice. The list below outlines options for implementing a health probe service from the simplest to the most complex. Basic Complexity A simple health probe can be implemented by calling the built-in CIS web manager login URL http://<hostname>:9400/manager/login. If the CIS node is available, it will serve up the login page with a HTTP return code of 200, otherwise an error will be returned. No custom development is needed to support this configuration and it does not require the configuration of the anonymous user. This option is not suitable for determining if the CIS server is under heavy load or partially hung. The manager login page may still render relatively quickly in those situations. Therefore, this option is not a good choice for mature customers. Intermediate Complexity This option requires a purpose-built web service in CIS. This service executes a simple query against the CIS repository database to retrieve a small result set. It is optional to include this result set in the web service s response message, so it can be evaluated by the caller. It is possible to use a CIS system table for this purpose, but note that not all system tables will actually rely on the repository database; some are implemented in CIS

memory. The ALL_DOMAINS system table is a good choice here, since it does rely on the repository database. Warning: The CIS system database should not be made directly accessible to the anonymous user. Instead, developers should create a data source via the 'Composite' adapter connected to the local CIS system database using a suitable service account. This data source can be used to only introspect the ALL_DOMAINS table, which can then be published for access by the anonymous user. High Complexity In some cases, a CIS implementation may be tightly associated with one or more external databases. It means that if any of these critical external databases are down or unreachable, then the CIS node should not be considered healthy for service. This is despite of the fact that the CIS node is functional. This is a business driven configuration. In this scenario, the health probe service needs to queries the critical database(s) to determine if they are reachable by CIS. Typically, this use case involves at least one of the following scenarios: Critical Cache Database The CIS implementation relies heavily on cached data. If the cached data is not available, such as down or stale, then the CIS node is not considered healthy. In this scenario, a CIS node's cluster membership status is very important when evaluating the node s health. If a CIS node is disconnected from the cluster, it will lose access to the cache database. A disconnected node will show all of its cached resources in a Down state. When implementing a health probe service for this scenario, developers should evaluate the cluster membership status of the CIS node into consideration. This can be achieved by evaluating the value of the STATUS column in the CIS system table - SYS_CLUSTER. The server s connection to the active cluster is considered healthy if it has a status of either OPERATIONAL or CONNECTED_READY. This document is shipped with two CAR files that demonstrate how to build such a health probe service. They are delivered in CIS 7.0 format. These objects look like below.

Critical Primary Data Source The CIS implementation relies heavily upon one or more primary data sources. If any of the data sources cannot be accessed or do not return results in an acceptable amount of time, then the CIS server should not be considered to be in a healthy state. It is possible in some scenarios for only some of the CIS nodes in a cluster to lose connectivity to the data source. Such scenarios include: Loss of connectivity at the network level. A CIS node has a maxed out connection pool to this data source. To handle this scenario, developers must implement a health probe operation to retrieve a limited volume of data from the primary data sources. Care should be taken to ensure that the implemented query does not place undue load on the CIS server or the data sources. Health Probe Retries and Status Latency In the load balancer, executing a health probe test before routing each incoming CIS request would impose too much overhead. Instead, Cisco recommends the load balancer to periodically execute the health probe operation on each CIS node. An interval of 60 seconds between health probes is generally a good choice. When a server fails a health probe, the load balancer can remove the server from the service pool immediately, or wait to remove the server if it fails additional health probes. For example, it takes three consecutive failed probes to take a node out of service. Both behaviors are viable, however requiring additional failures will lengthen the period of time that an offline or slow node remains in service. If a CIS node goes offline, there will be a small latency period between when the outage occurs and when it is detected by the load balancer. In the worst-case scenario, this can take up to 60 seconds (or longer if retry is enabled) for the load balancer to determine that the

node is not available and should be removed. To address this scenario, it is recommended that CIS' client applications should use a connection validation query when connecting to CIS. This will force the client applications to execute the connection validation query when they check out a connection to CIS from the connection pool. If the connection validation query fails for any reason, the client application will try to connect again. If the load balancer routes the new request to a different CIS node that is in a healthy state, the connection will succeed. Health Probe Retries and Recovery When a sick node returns to a healthy state, the load balancer will detect its updated status the next time a health probe is executed. The load balancer can be configured to either immediately re-add the node to the service pool or wait for it to pass multiple health probe operations. This is called a recovery period. It is useful in scenarios where a CIS node fluctuates between being sick and healthy states intermittently. This is known as flapping. It is recommended to implement such a recovery period. Three consecutive passes is a good configuration. Probing Additional Ports The aforementioned health probe implementation is based on a single service and a single port. While this is generally sufficient to monitor the health of a CIS node, in some cases the load balancer administrator may determine that it is necessary to probe additional ports to ensure that they are also accessible. Cisco recommends that implementation of additional health probes be limited to only client facing ports (those to be forwarded by the load balancer as listed in Table 1). There is no need to probe the ports used for CIS internal traffic (shaded rows in Table 1). Scheduled Maintenance for CIS During scheduled maintenance periods, where one or more CIS nodes will be taken offline, administrators should proactively remove the CIS node(s) from the load balancer's service pool. This is necessary to ensure that no incoming requests are sent to the server while it is offline or otherwise in a unsuitable state such as being in the middle of a code deployment. When bringing down a server for maintenance, the following process is recommended: 1. Remove it from the load balancer's service pool 2. Wait a short interval of time; say 15 minutes, to allow any running requests on the CIS instance to complete before shutting down the node. We can give it 15 minutes to finish them, before stopping this node. 3. Remove the CIS server from the active cluster if performing a rolling outage. 4. Shut down the CIS server and perform the maintenance work 5. Start the CIS server and run regression tests to confirm that it is functioning correctly

6. Add the server back to the active cluster if necessary. Please note that all CIS servers in an active cluster MUST be on the same release and patch level. 7. Re-add the server to the load balancer s service pool. Please Note: Deleting an active cluster will invalidate the data contained in all shared CIS caches! If this occurs CIS administrators must perform one of the following actions to restore the caches Refresh all cached resources. This can take a very long time. Then you need to delete the orphaned records associated with the obsoleted clustereid. They cannot be deleted in Studio. You will need to issue SQL DELETE statements from a suitable database client. Directly modify the contents of the cache database metadata tables to use the 'clusterid' value for the new cluster instance. It is strongly recommended that you contact Cisco support prior to attempting this!

FREQUENTLY ASKED QUESTIONS Should CIS Studio Users Connect through the Load Balancer? Unlike client applications, CIS Studio needs to be connected to the same CIS node for the duration of a session. This is necessary to ensure that any metadata or configuration changes are not lost if the load balancer redirects the studio session to another CIS server. According to the Active Cluster Administration guide, connecting CIS studio through a load balancer is not officially supported. If it is absolutely necessary to use CIS Studio in conjunction with a load balancer you need to enable session affinity (sticky sessions). Should CIS Web Manager Users Connect through the Load Balancer? CIS Web Manager is an administrative tool intended to allow direct administration of a single CIS instance at a time. While it is technically possible to open CIS Web Manager through a load balancer you may see confusing information if session affinity (sticky sessions) are not enabled. For example if a user is viewing the JVM memory consumption graph and reloads the memory page the user s request may be forwarded to a different node in the cluster, which will likely return a different memory graph. As per the use case with CIS studio above, it is possible to avoid this issue by configuring sticky sessions. However, we do not recommend doing so since administrators will typically need to connect to a specific CIS node to perform specific tasks. Are there any benefits to using a load balancer for a single CIS server? There are several reasons to use a load balancer in conjunction with a single CIS node. For example: Using a load balancer provides a friendly and permanent virtual IP/host name for CIS. This allows administrators to conceal underlying changes to the physical CIS nodes, such as migrating the CIS server to a different physical machine, cutting over to an alternate CIS server during a disaster recovery scenario or the addition of an active cluster in the future. It is critical to ensure that no end users connect to a CIS cluster during promotions or maintenance windows to reduce the likelihood of a failure. Use of a load balancer can simplify management of outage periods by allowing administrators to simply disable the load balancer, preventing end users from connecting to CIS. Are there Benefits to Using a Load Balancer for a Business Directory Server? As of CIS 7.0, Business Directory is not a cluster-enabled component. It must run on a single node. The use of a load balancer may still make sense for an organization for the same reasons as using a load balancer for a single CIS server above.

How would Business Directory connect to a load balanced CIS cluster? Business Directory (BD) does not require real-time access to CIS in order to function. However, It does need to re-sync its metadata with CIS periodically, via JDBC. This typically occurs on a scheduled basis. Like all other JDBC clients, BD should connect to the virtual IP instead of an individual node. Are there Extra Steps for Configuring SOAP Web Services Do not use the default server names inside the WSDL. It is based on physical host names. To address this, you will need to update the configuration property Server > Configuration > Network > WSDL Hostname on all nodes to use the virtual host name of the load balancer. See the screenshot below. Please note: Omitting this step will result in a WDSL definition that incorrectly includes the physical host name. Client applications that use such a WSDL will bypass the load balancer and attempt to connect directly to a specific CIS node Are there Extra Steps for Configuring REST Web Services When providing the REST URL endpoints to your target audience, please modify the values shown below. Replace the host name with a virtual IP. Note this modification is performed outside of CIS.

. Printed in USA CXX-XXXXXX-XX 10/11