ExpressCluster for Linux Version 3 Cluster Resource Reference

Size: px

Start display at page:

Download "ExpressCluster for Linux Version 3 Cluster Resource Reference"

Mariah Lambert
5 years ago
Views:

1 ExpressCluster for Linux Version 3 Cluster Resource Reference Revision 6us

2 EXPRESSCLUSTER is a registered trademark of NEC Corporation. Linux is a trademark or registered trademark of Linus Torvalds in the United States and/or other countries. RPM is a trademark of Red Hat, Inc. Intel, Pentium, and Xeon are the registered trademarks or trademarks of Intel Corporation. Microsoft and Windows are the registered trademarks in the U.S. of U.S. Microsoft Corporation, and other countries. VERITAS, its logo and other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation. 2

3 1.1 GROUP RESOURCES Group Operation style Failover policy Application Failover factors Detecting activation and deactivation abnormality Limit of the number of reboots Resetting the reboot count Exec Resource ExpressCluster versions Dependencies Notes on exec resources: Scripts Disk Resource ExpressCluster versions Dependencies Switching partition fsck execution timing Precautions about shared disks Floating IP Resource ExpressCluster versions Dependencies Floating IP Floating IP address of each server Notes on floating IP resource Mirror disk Resource ExpressCluster versions Dependencies Mirror disk Mirror prameters Notes on mirror disk resource RAW Resource ExpressCluster versions Dependencies Switching partition Notes on RAW resource VxVM Related Resource Behavior test information Dependencies Resources controlled by ExpressCluster VxVM disk group resource VxVM volume resource Notes on control by ExpressCluster Clustering by VERITAS Volume Manager NAS Resource ExpressCluster versions Dependencies NAS Notes on NAS resource MONITOR RESOURCE

4 1.11 Monitor Resource Monitor timing Monitor interval Abnormality detection Returning from monitor error (normal) Activation/Deactivation abnormality of recovery object while performing recovery actions Delay warning Waiting for the start of monitoring Limit of the number of reboots Monitor priority Disk Monitor Resource ExpressCluster versions Method I/O size RAW Monitor Resource ExpressCluster versions Notes on RAW Monitor Resource RAW Monitor Resource setting samples --For Linux Kernel RAW Monitor Resource setting samples --For Linux Kernel IP Monitor Resource ExpressCluster versions Method NIC Link Up/Down Monitor Resource Supported environments Note Configuration and range of monitoring Mirror Disk Connect Monitor Resource ExpressCluster versions Notes Mirror Disk Monitor Resource ExpressCluster versions Notes PID Monitor Resource ExpressCluster versions Notes User Space Monitor Resource Supported environments Depended Driver Depended rpm Method Advanced settings of monitoring Monitor logic Checking Availability of ipmi ipmi command Note VxVM daemon Monitor Resource ExpressCluster versions Note VxVM volume monitor resource ExpressCluster versions Note Multi-target Monitoring Resource ExpressCluster versions

5 Multi-target monitor resource status Setting examples HEARTBEAT RESOURCES LAN Heartbeat Resource Supported environments Notes Kernel Mode LAN Heartbeat Resource Supported environments Kernel mode LAN heartbeat resource Notes Disk Heartbeat Resource Supported environments Disk heartbeat resource Notes COM Heartbeat Resource Supported environments Notes SHUTDOWN STALL MONITORING ExpressCluster Versions Shutdown Monitoring Method Setting SIGTERM APPENDIX bonding FIP resource Mirror connect APPENDIX Selecting Applications Applications in ExpressCluster Environment Server application Notes on server applications Countermeasures to notes and restrictions How To Determine Application s Style

6 1.1 GROUP RESOURCES The number of group resources that can be registered with a group is determined by editions and versions. The number of group resources that can be registered is shown below. Edition Version Number of group resources (Per group) SAN/SE to SAN/SE or later 128 WAN/LAN/LE or later 16 Currently supported group resources are: Group resource name Abbreviation Functional overview Exec resource exec See 1.3 Exec Resource Disk resource disk See 1.4 Disk Resource Floating IP resource fip See 1.5 Floating IP Resource Mirror disk resource md See 1.6 Mirror disk Resource RAW resource raw See 1.7 RAW Resource VxVM disk group resource vxdg VxVM vvxvm Related See 1.8 vxvol Resourceolume resource NAS resource nas See 1.9 NAS Resource 6

7 1.2 Group A group is a set of resources required to perform an independent business service in a cluster system. This is also the unit of failover. A group has its group name, group resources, and attributes. Each group s resources are handled a set of group. Namely, if a failover occurs in Group1 that has Disk resource1 and Floating IP address1, Disk resource1 and Floating IP address1 failover (it never happens that only Disk resources1 failover). Likewise, Disk resources1 is never contained in other group (for example, Group2). 7

8 1.2.1 Operation style ExpressCluster supports the following operation styles; * Uni-directional standby cluster. In this operation style, only one business application runs on a cluster system. Application A * The same application multi-directional standby cluster. In this operation style, the same business applications run simultaneously on a cluster system. Application A Application A * Different application multi-directional standby cluster. In this operation style, different business applications run on different servers for each. They stand by each other. Application A Application B 8

9 (1) Uni-directional standby cluster On a uni-directional standby cluster system, the number of groups for a business application is limited to one. 9

10 (2) Multi-directional standby cluster On a multi-directional standby cluster system, a business application can simultaneously run on two or more servers. 10

11 1.2.2 Failover policy Failover policy contains a list of servers where groups can failover to and their failover priorities. The followings describe how servers behave differently depending on failover policies when a failover occurs. <Legends and meanings> Server Status Description o Normal Status (properly working as a part of cluster) Stopped (Cluster is off.) In 3-node configuration: Group Failover Policy Priority 1 Server Priority 1 Server A Server 1 A Server 1 B Server 2 B Server 2 In 2-node configuration: Failover Policy Priority 1 Server A Server 1 A B Server 2 B It is assumed that the group startup attribute is set to the auto-startup and the failback attribute is set to the manual failback for A and B. * If groups of different failover exclusive attributes coexist in a cluster, they do not interfere with each other. For example, a group of full exclusive attribute may start on a server where another group of off exclusive attribute is active, and vice versa. * For groups whose failover exclusive attribute is Normal or Full Exclusive, the server which they start up on or fail over to is determined based on the failover priority to the server. If a group has two or more servers of the same failover priority, it is determined in alphanumerical order of group name. * Based on the priorities of servers themselves, the failover priorities are given to Web Manager group. You specify servers priorities in Master Server tab of Cluster properties. 11

12 For SAN/SE (1) If the failover exclusive attribute is set to Off for groups A and B 1. Cluster Startup 2. Cluster Shutdown 3. Server 1 Down : Failover to the next priority server. 4. Server 1 Power On 5. Cluster Shutdown 6. Migration of group A 7. Server 2 Down : Failover to the next priority server. 8. Server 2 Down : Failover to the next priority server. 9. Server 3 Down : Failover to the next priority server. 10. Server 2 Down : Failover to the next priority server. 11. Server 2 Down : Failover to the next priority server. 12

13 (2) If the failover exclusive attribute is set to Normal for groups A and B Server X <- Server 1 X <- Server 2 X <- Server 3 O A X O B (7)Server 2 Down X X O AB X O AB X O B X O A O A O B X (8)Server 2 Down (9)Server 3 Down (10)Server 2 Down (11)Server 3 Down X (1)Cluster Startup O A X O X X (2)Cluster shutdown O B O (3)Server 1 Down O B O A (4)Server 1 On O B O A Transfer of A (5)Cluster Shutdown 1. Cluster startup 2. Cluster shutdown 3. Server 1 Down : Failover to the server where a Normal Exclusive group is not active. 4. Server 1 Power On 5. Cluster shutdown 6. Migration of group A 7. Server 2 Down : Failover to the server where a Normal Exclusive group is not active. 8. Server 2 Down : Failover because a server can start even though one or more Normal Exclusive group is active on all servers. 9. Server 3 Down : Failover because a server can start even though one or more Normal Exclusive group is active on all servers. 10. Server 2 Down : Failover to the server where a Normal Exclusive group is not active. 11. Server 3 Down : Failover to the server where a Normal Exclusive group is not active. 13

14 (3) If the failover exclusive attribute is set to Full for groups A and B Server X <- Server 1 X <- Server 2 X <- Server 3 O A X O B (7)Server 2 Down X X O A X O B X O B X O A O A O B X (8)Server 2 Down (9) Server 3 Down (10) Server 2 Down (11)Server 3 Down X (1)Cluster Startup O A X O X X (2)Cluster Shutdown O B O (3)Server 1 Down O B O A (4)Server 1 On O B O A (6)Transfer of A (5)Cluster Shutdown 1. Cluster Startup 2. Cluster Shutdown 3. Server1 Down : Failover to a server where no group of full exclusive is active. 4. Server 1 Power On 5. Cluster Shutdown 6. Migration of group A 7. Server 2 Down : Failover to a server where no group of full exclusive is active. 8. Server 2 Down : Not failover (groupb becomes inactive). 9. Server 3 Down : Not failover (groupa becomes inactive). 10. Server 2 Down : Failover to a server where no group of full exclusive is active. 11. Server 3 Down : Failover to a server where no group of full exclusive is active. 14

15 For WAN/LAN/LE (including SAN/SE of 2-server configuration) (1) If the failover exclusive attribute is set to Off for groups A and B 1. Cluster Startup 2. Cluster Shutdown 3. Server1 Down : Failover to groupa s standby server. 4. Server 1 Power On 5. Cluster Shutdown 6. Migration of group A 7. Server 2 Down : Failover to groupb s standby server. 8. Server 2 Down 9. Server 2 Down : Failover to a standby server. 15

16 1.2.3 Application If an application supports clustering, when a failover or group s migration occurs, you can restart it with a script on the other server. However, it is prerequisite that the application of the same revision is installed on all servers in the failover policy. Also, the application s data to be ongoingly used should be able stored in shared disks or mirror disks. To run an application in ExpressCluster environment, it should satisfy other conditions, too. For details, see Section 1.35 Applications in ExpressCluster Environment Failover factors The main causes of failover are listed below. * Server shutdown * Power down * OS panics * OS stalling * ExpressCluster daemon error * Activation or deactivation failure of a group resource * Abnormality detection by Monitor Resource 16

17 1.2.5 Detecting activation and deactivation abnormality When an activation or deactivation abnormality is detected, it is controlled as follows. * When an abnormality is detected in activating a group resource: + When an abnormality or error is detected in activating a group resource, activation retry is performed. + When the activity retry fails for the number of counts set to [Activity Retry Threshold], failover takes place. + If the failover fails for the number of times set to [Failover Threshold], the final action is performed. * When an abnormality is detected in deactivating a group resource: + When an abnormality or error is detected in deactivating a group, deactivation retry is performed. + If the deactivation retry fails for the number of times set to [Deactivity Retry Threshold], the final action is performed. Activation retry and failover are counted on a server basis. [Activity Retry Threshold] and [Failover Threshold] are maximum activation retry count and failover count on a server basis. Reactivation retry count and failover count are reset in a server where the group activation is successful. Note that an unsuccessful recovery action is also counted into ti ti t f il t The following pages describe what will happen if an activation abnormality is detected in the settings shown below. Setting Example Activity Retry Threshold 3 Failover Threshold 1 Final Action Stop Group Example of actions to be taken in the settings shown above is illustrated in the following pages. : 17

18 1.2.6 Limit of the number of reboots If [Stop Cluster Daemon And OS Shutdown] or [Stop Cluster Daemon and OS Reboot] is selected as a final action to be taken when any activation or deactivation abnormality is detected, you can limit the number of shutdowns or reboots caused by detection of activation or deactivation abnormalities. This maximum reboot count is the upper limit of reboot count on a server basis. Because the number of reboots is recorded on a server basis, the maximum reboot count is the upper limit of reboot count on a server basis. The number of reboots that are taken as a final action in detection of abnormality in group activation or deactivation and the number of reboots that are taken as a final action in detection of abnormality by a monitor resource are recorded separately. If the time to reset the maximum reboot count is set to zero (0), the number of reboots is not reset. To reset this number, run the clpregctrl command. See Command for details of the clpregctrl command. The following pages describe what will happen if the number of reboots is limited in the settings shown below. As a final action, [Stop Cluster Daemon and OS Reboot].is performed once because the maximum reboot count is set to one (1). If group activation is successful at a reboot following the cluster shutdown, the number of reboots is reset when 10 minutes elapse because the time to reset maximum reboot count is set to 10 minutes. Setting Example Activity Retry Threshold 0 Failover Threshold 0 Final Action Stop Cluster Daemon and OS Reboot Maximum Reboot Count 1 Time to reset maximum reboot count 10 minutes Example of actions to be taken in the settings shown above is shown in the following pages. 18

19 Server1 Failover Group A Disk resource 1 Server2 It is an activity processing start about the disk resource 1 (a fail over 擢 ailover Group Asubordinate's resource). (Mount processing of a file system etc. is performed) Maximum Reboot Count Once Current Reboot Count 0 Shared disk Maximum Reboot Count Once Current Reboot Count 0 Server1 Failover Group A Server2 Activity processing of the disk resource1 failed. (fsck error, mount error, etc.) Disk resource 1 Activity failed Maximum Reboot Count Once Current Reboot Count 0 Shared disk Maximum Reboot Count Once Current Reboot Count 0 Server1 Failover Group A A cluster demon stop Disk resource 1 and a reboot Server2 OS is rebooted after stopping a cluster demon. An "activity retry threshold" and "fail over threshold" By the execution server1, 1 is recorded on the number of times of a reboot in the last operation for 0. Maximum Reboot Count Once Current Reboot Count 0 Shared disk Maximum Reboot Count Once Current Reboot Count 0 19

20 Server1 Failover Group A Server2 Fail over processing of Failover Group A is started. A cluster ディスクリソース demon stop 1 and a reboot Maximum Reboot Count Once Current Reboot Count 0 Shared disk Maximum Reboot Count Once Current Reboot Count 0 Server1 Server2 Failover Group A Disk resource 1 It is an activity processing start about the disk resource 1 (a fail over 擢 ailover Group Asubordinate's resource). (Mount processing of a file system etc. is performed) Maximum Reboot Count Current Reboot Count Once Once Maximum Reboot Count Once Current Reboot Count 0 Shared disk Resource activity success in a server 2. A server 1 is the completion of a reboot. Server1 Server2 Failover Group A Disk resource 1 The clpgrp command and a Web manager are used and the 擢 ailover Group Ais moved to a server 1. Maximum Reboot Count Current Reboot Count Once Once Maximum Reboot Count Once Current Reboot Count 0 Shared disk 20

21 Server1 Failover Group A Disk resource 1 Server2 It is an activity processing start about the disk resource 1 (a fail over 擢 ailover Group Asubordinate's resource). (Mount processing of a file system etc. is performed) Maximum Reboot Count Current Reboot Count Once Once Maximum Reboot Count Once Current Reboot Count 0 Shared disk Server1 Failover Group A Server2 Activity processing of the disk resource 1 failed. (fsck error, mount error, etc.) Disk resource 1 Activity failed Maximum Reboot Count Current Reboot Count Once Once Shared disk Maximum Reboot Count Once Current Reboot Count 0 Server1 Failover Group A Server2 Since the current times of the maximum reboot is reached, the final action is not performed. Disk resource 1 Even if 10 minutes pass, the number of times of a reboot does not reset. A Failover Group A is an activity failure state. Maximum Reboot Count Current Reboot Count Once Once Maximum Reboot Count Once Current Reboot Count 0 Shared disk The factor of the abnormalities in a disk is solved. 21

22 Server1 Failover Group A Disk resource 1 Server2 The clpstdn command and a Web manager are used and the reboot after a cluster shutdown is performed. Maximum Reboot Count Current Reboot Count Once Once Maximum Reboot Count Once Current Reboot Count 0 Shared disk It succeeds in starting of 擢 ailover Group A Server1 Failover Group A Server2 Failover Group A starts normally. The current times of a reboot is reset after 10minute progress. Disk resource 1 The final action is performed when the abnormalities in disk activity occur next time at the time of Failover Group A starting. Maximum Reboot Count Once Current Reboot Count 0 Maximum Reboot Count Once Current Reboot Count 0 Shared disk Resetting the reboot count To reset the reboot count, run the clpregctrl command. See Command for details of the clpregctrl command. 22

23 1.3 Exec Resource You can register applications and shell scripts in ExpressCluster that are managed by ExpressCluster and to be run at group startup, stop, failover or migration. You can register your own programs and shell scripts also in exec resources. Shell scripts are formatted in the same manner as sh shell scripts, therefore, you can write codes as required for the application ExpressCluster versions The following ExpressCluster versions support this function. ExpressCluster Server Configuration Tool Version SE3.0-1 or later, LE3.0-1 or later, XE3.0-1 or later SX3.1-2 or later For supported Versions, see a separate guide, Operational Environment, Configuration Tool Operational Environment Dependencies By default, this function depends on the following group resource types. Group resource type Floating IP resource Disk resource Mirror disk resource RAW resource VxVM disk group resource VxVM volume resource NAS resource Edition SAN/SE, WAN/LAN/LE, XE, SX SAN/SE, XE, SX WAN/LAN/LE SAN/SE, XE, SX SAN/SE SAN/SE SAN/SE, WAN/LAN/LE, XE, SX Notes on exec resources: * It is prerequisite that the application to be run from exec resources of the same revision is installed on all servers in failover policy. This section explains scripts provided in exec resources by default. The top priority server specified in the group s startup server tab is called Primary server. 23

24 1.3.4 Scripts Script types Start Script and Stop Script are provided in exec resources. ExpressCluster runs a script for each exec resource if the cluster needs to change its status. You have to describe processes in these scripts about how you want applications to start, stop, and restore in your cluster environment. Server 1 Group A 1 Start Stop Group B Start Server 2 Server 3 Group C Start Group D Stop Start Stop Stop Start Stop : Start Script : Stop Script 24

25 Script environment variables When ExpressCluster runs a script, it sets information (about the condition which it was run on such as script starting factor) in environment variables. You can use the environment variables in the table below as branching conditions when you write codes for your system operation. Stop Script returns the contents of immediately preceding Start Script in the environment variable as a value. Start Script does not set environment variables, CLP_FACTOR or CLP_PID. The environment variable, CLP_LASTACTION, is set only when the environment variable, CLP_FACTOR, is CLUSTERSHUTDOWN or SERVERSHUTDOWN. Environment Variable Environment Variable Value Meaning CLP_EVENT script starting factor START Is set when the script was run by, the cluster startup, or by the group startup, or by the group migration on the server that the group moved to, or by the group restart on the same server because of abnormality detection in monitor resource, or by the group resource restart on the same server because of abnormality detection in FAILOVER Is set when the script was run by, the server down on the server that the group failed to, or by an abnormality detection in monitor resource on the server that the group failed to, or by the group resource activation failure on the server that the group failed to. CLP_FACTOR group stopping factor CLUSTERSHUTDOWN Is set when the group was stopped by the cluster shutdown. SERVERSHUTDOWN Is set when the group was stopped by the server shutdown. GROUPSTOP A group was stopped with group stop. GROUPMOVE A group was moved with group move. GROUPFAILOVER GROUPRESTART Is set when the group failed over by an abnormality detection in monitor resource, or by the group resource activation failure Is set when the group was restarted by a monitor resource abnormality detection. RESOURCERESTART Is set when the group resource was restarted by a monitor resource abnormality detection. CLP_LASTACTION REBOOT To reboot the OS process after cluster HALT To halt the OS shutdown NONE No action is taken. CLP_SERVER HOME The script was run on the group s primary server. 25

26 Environment Variable Environment Variable Value server where the OTHER script was run CLP_DISK partition connection information on shared or mirror disks CLP_PRIORITY the order in failover policy of the server where the script was run SUCCESS FAILURE 1 to the number of servers in the cluster CLP_GROUPNAME Group name Group name CLP_RESOURCENAME Resource name Resource name CLP_PID Process ID Process ID Meaning The script was run on a server other than the group s primary server. There is no partition where connection has failed. There is one or more partition where has connection failed. Represents the priority of server where the script is run. This number starts from 1. The smaller the number, the higher the server s priority. If CLP_PRIORITY is 1, it means that the script was run on the primary server. Represents the group name that the script is contained. Represents the resource name that the script is contained. Represents the Start Script process ID if the Start Script s property is set to asynchronous. This environment variable is null when the Start Script is synchronous. 26

27 Script execution timing This section describes the relationships between the execution timings of Start and Stop Scripts and environment variables according to cluster status transition diagrams. * To simplify the explanations, a 2-server cluster configuration is used. See the supplements for the relations between possible execution timings and environment variables in 3 or more server configurations. * O and X in diagrams represent the server statuses. Server Server Status O Normal Status (properly working as a part of cluster) X Stopped (Cluster is off.) (Example) OA : Group A is working on a normally running server. * Each group is started on the top priority server among active servers. * Three groups are defined in the cluster, they have their own failover policies as follows; Group Priority 1 Server Priority 2 Server A Server 1 Server 2 B Server 2 Server 1 C Server 1 Server 2 * The upper server is referred to as Server 1 and the lower one is to Server 2. <- Server 1 <- Server 2 27

28 Cluster status transition diagram: This diagram illustrates a typical cluster status transition. Numbers (1) to (11) in the diagram correspond to descriptions in the following pages. 28

29 (1) Normal startup Normal startup here means that the Start Script has been run properly on the primary server. Each group is started on the top priority server among the active servers. Environment variables for Start Group Environment Variable Value A B C CLP_EVENT CLP_SERVER CLP_EVENT CLP_SERVER CLP_EVENT CLP_SERVER START HOME START HOME START HOME 29

30 (2) Normal shutdown If a cluster shutdown is run immediately after a Start Script corresponding to Stop Script was run by normal startup or by a group migration (online failback), it is the normal shutdown here. Environment variables for Stop Group Environment Variable Value A B C CLP_EVENT CLP_SERVER CLP_EVENT CLP_SERVER CLP_EVENT CLP_SERVER START HOME START HOME START HOME 30

31 (3) Failover when Server 1 goes down When a group s Start Script has Server1 as its primary server, if the server goes down, it is run on a lower priority server (Server2). To do so, Start Script should have CLP_EVENT(=FAILOVER) as a branching condition for triggering applications startup and recovery processes (such as database rollback process). If you have a process you want to perform only on a server other than the primary server, you have to specify CLP_SERVER(=OTHER) as a branching condition and describe the process in your script. Environment variables for Start Group Environment Variable Value A CLP_EVENT FAILOVER CLP_SERVER OTHER C CLP_EVENT FAILOVER CLP_SERVER OTHER 31

32 (4) Cluster shutdown after Server 1 failover Stop Scripts of groups A and C are run on Server 2 where the groups failed over to (Group B s Stop Script is run by a normal shutdown). Environment variables for Stop Group Environment Variable Value A B C CLP_EVENT CLP_SERVER CLP_EVENT CLP_SERVER CLP_EVENT CLP_SERVER FAILOVER OTHER START HOME FAILOVER OTHER 32

33 (5) Migration of groups A and C Stop Scripts of groups A and C are run on Server 2 where the groups failed over to, then, their Start Scripts are run on Server 1. Environment variables for Stop Group Environment Variable Value A CLP_EVENT FAILOVER 1 CLP_SERVER OTHER C CLP_EVENT FAILOVER CLP_SERVER OTHER Environment variables for Start Group Environment Variable Value A CLP_EVENT START CLP_SERVER HOME C CLP_EVENT START CLP_SERVER HOME 1 Environment variables in Stop script take those in the preceding Start script. In case of Section (5) Migration of Group A and C, because it is not preceded by a cluster shutdown, the environment variable is FAILOVER. However, if a cluster shutdown is performed in this case, the environment variable will be START. 33

34 (6) Failure in Group C and failover If a failure occurs in Group C, its Stop Script is run on Server 1, then, its Start Script is run on Server 2. Stop for Server 1 Group Environment Variable Value C CLP_EVENT START CLP_SERVER HOME Start for Server 2 Group Environment Variable Value C CLP_EVENT FAILOVER CLP_SERVER OTHER 34

35 (7) Migration of Group C You move the group C that was failed over to Server 2 in Step (6) from Server 2 to Server 1. Run the Stop Script on Server 2, then, run Start Script on Server 1. Stop (because this is failed over in Step (6)) Group Environment Variable Value CLP_EVENT FAILOVER C CLP_SERVER OTHER Start Group Environment Variable Value CLP_EVENT START C CLP_SERVER HOME 35

36 (8) Stopping group B Group B s Stop Script is run on Server 2. Stop Group Environment Variable Value CLP_EVENT START B CLP_SERVER HOME 36

37 (9) Starting Group B Group B s Start Script is run on Server 2. Start Group Environment Variable Value CLP_EVENT START B CLP_SERVER HOME 37

38 (10) Stopping Group C Group C s Stop Script is run on Server 2. Stop Group Environment Variable Value CLP_EVENT FAILOVER C CLP_SERVER OTHER 38

39 (11) Starting Group C Group C s Start Script is run on Server 2. Start Group Environment Variable Value CLP_EVENT START C CLP_SERVER OTHER 39

40 Supplement 1: If you want a group who has three or more servers specified in the failover policy to behave differently on other servers than the primary server, you should use CLP_PRIORITY instead of CLP_SERVER(HOME/OTHER). X A A Sample 1: On (3) Failover when Server 1 goes down in cluster status transition diagram A group has Server 1 as its primary server. If a failure occurs on Server 1, its Start Script is run on the next prioritized failover policy server, Server 2. The Start Script should have CLP_EVENT(=FAILOVER) as the branching condition for triggering applications startup and recovery processes (such as database rollback process). If you want a process to perform only on the second priority failover policy server, your script for the process should have CLP_PRIORITY(=2) as the branching condition. Legends Server 1 Server went down Group A Start Server 2 B A C (1) (1) Server 3 Group C Start : Script execution : Application Start : Start Script Stop : Stop Script (Letters are the names.) (1) (2) : Execution order Stop Stop Environment variables for Start Group Environment Variable Value CLP_EVENT FAILOVER A CLP_SERVER OTHER CLP_PRIORITY 2 CLP_EVENT FAILOVER C CLP_SERVER OTHER CLP_PRIORITY 2 40

41 Sample 2: On (7) Migration of Group C in the cluster status transition diagram O O AB O C O O ABC O Group C s Stop Script is run on Server 2 where the group failed over from, then, Start Script is run on Server 3. Environment variables for Stop Group Environment Variable Value CLP_EVENT FAILOVER C CLP_SERVER OTHER CLP_PRIORITY 2 Environment variables for Start Group Environment Variable Value CLP_EVENT START C CLP_SERVER OTHER CLP_PRIORITY 3 41

42 Supplement 2: If Resource Monitor (re)starts a script, To restart a Start Script when Resource Monitor detected an abnormality in an application, the environment variables should be as follows; Sample 1: Resource Monitor detected an abnormal termination in an application that was running on Server 1, and you want to restart Group A on Server. Environment variable for Stop Group Environment Variable Value A (1) CLP_EVENT The same value as when Start is run. Environment variable for Start Group Environment Variable Value A (2) CLP_EVENT START 42

43 (Sample 2) Resource Monitor detected an abnormal termination of application that was active on Server 1, then, Group A failed over to Server 2 and started there. Environment variable for Stop Group Environment Variable Value A (1) CLP_EVENT The same value as when Start is run. Environment variable for Start Group Environment Variable Value A (2) CLP_EVENT FAILOVER 43

44 Script codes This section explains script execution timing described in the preceding section with samples script codes. Numbers in brackets (#) in the following code sample represent corresponding actions described in Section Script execution timing. A. Group A Start Script: A sample start.sh #! /bin/sh # *************************************** #* start.sh * # *************************************** if [ "$CLP_EVENT" = "START" ] then if [ "$CLP_DISK" = "SUCCESS" ] then According to this environment variable of script starting factor, the process to go is determined. Overview of processing: Application s normal startup process When to start this process: (1) Normal startup (5) Migration of Group A and C if [ "$CLP_SERVER" = "HOME" ] then According to this execution server environment variable, the process to go is determined. Overview of processing: A process you want to do if the application is normally started on the primary server. When to start this process: (1) Normal startup (5) Migration of Group A and C else Overview of processing: A process you want to do if the application is normally started on other server the primary server. When to start this process: else fi Disk-related error process fi elif [ "$CLP_EVENT" = "FAILOVER" ] then 44

45 if [ "$CLP_DISK" = "SUCCESS" ] then Overview of processing: Application s normal startup process When to start this process: (3) Failover because Server 1 went down. According to the DISK connection information environment variable, the error process to go is determined. if [ "$CLP_SERVER" = "HOME" ] then According to this execution server environment variable, the process to go is determined. Overview of processing: A process you want to do if the application is terminated on the primary server after failover. When to start this process: else Overview of processing: A process you want to do if the application is started on the non-primary server after failover. When to start this process: (3) Failover because Server 1 went down. else fi Disk-related error process else fi #NO_CLP ExpressCluster is not running. fi #EXIT exit 0 45

46 B. Group A Stop Script: A sample stop.sh if [ "$CLP_DISK" = "SUCCESS" ] then Overview of processing: Normal termination process after failover When to start this process: (4) Cluster shutdown after Server 1 failed over (5) Migration of Group A and C if [ "$CLP_SERVER" = "HOME" ] then According to this execution server environmen t variable, the process to go is determined. Overview of processing: A process you want to do if the application is terminated on the primary server after failover When to start this process: else Overview of processing: A process you want to do if the application is terminated on other server than the primary server after failover When to start this process: (4) Cluster shutdown after Server 1 failed over (5) Migration of Group A and C else fi else Disk-related error process fi #NO_CLP ExpressCluster is not running. fi #EXIT exit 0 46

47 47

48 Tips to create a script Note the followings when creating scripts. * If your script has a command that requires some time to complete, it is recommended to always deliver a command completion message to standard output. You can deliver messages with the echo command to standard output. In addition to this, you make settings for the log output path in the resource properties that contain the script. These messages are useful to examine the cause of troubles if any. However, theses messages are not logged by default. For how to make settings for the log output path, see Section Exec Resource Tuning Properties. (Example: Image in script) echo appstart.. appstart echo OK * Observe your file system s available disk space carefully because if you specify a file as the log output destination file, messages are sent to that file no matter how many disk spaces are available there. 48

49 1.4 Disk Resource ExpressCluster versions The following ExpressCluster versions support this function. ExpressCluster Server Configuration Tool Version SE3.0-1 or later, XE3.0-1 or later, SX3.1-2 or later For supported Versions, see a separate guide, Operational Environment, Configuration Tool Operational Environment Dependencies By default, this function depends on the following group resource types. Group resource type Floating IP resource Edition SAN/SE, XE, SX 49

50 1.4.3 Switching partition * Partitions on shared disks connected to more than one servers in a cluster are referred to as Switching partitions. * Switching is done for each failover group according to the failover policy. By storing data required for business applications on switching partitions, the data can automatically and seamlessly be used after failover/migration of failover group. * The same area in switching partitions should be accessible with the same device name on all servers. Server 1 went down. Failover Application A Application A Server 1 Server 2 Server 1 Server 2 Shared disk Shared disk 50

51 1.4.4 fsck execution timing The timing of fsck execution can be adjusted in the following versions. ExpressCluster Version Server SE3.1-6 or later, XE3.1-6 or later, SX3.1-6 or later Configuration Tool or later (1) fsck actions before executing mount (fsck timing) You can choose a behavior of fsck before mounting the switching partition. + Execute every time fsck is executed each time. + Execute at specified count fsck is executed when the mount count of the switching partition reaches the specified value (fsck interval). e.g.) When the specified count is 5 times Server 1: 1st disk resource activation fsck: do not execute (mount count: 0) mount: succeeded (mount count: 1) : Server 1: 5th disk resource activation fsck: do not execute (mount count: 4) mount: succeeded (mount count: 5) : Server 1: 6th disk resource activation fsck: execute (mount count: 5) succeeded (mount count reset: 0) mount: succeeded (mount count: 1) : + Not Execute fsck is not executed. 51

52 (2) fsck actions when mount failed You can choose a behavior of fsck when mounting the switching partition failed. If 0 is set to the mount retry count, fsck is not executed regardless of this setting. + Execute fsck is executed. However, fsck may not be executed in the following conditions of fsck actions before mounting = when the fsck actions before mounting is set to Execute every time = when the fsck actions before mounting is set to Execute at specified count, and the count reaches the specified value, and then fsck is executed + Not execute fsck is not executed. It is not recommended to use this setting when the fsck action before mounting is set to Not Execute. In this setting, disk resources cannot be failed over when an abnormality which can be recovered by fsck occurs on the switching partition, since fsck is not executed on the disk resources. 52

53 1.4.5 Precautions about shared disks * Make settings so that the same partition is accessible with the same device name. * For shared disks, functions such as stripe set, volume set, mirroring, stripe set with parity by Linux md are not supported. * ExpressCluster controls accesses to the file system (mount/umount). Do not make settings about mount/umount on the OS. * The partition device name set to the disk resource is read-only on all servers in a cluster. Read-only status is released when the server is activated. * The timing when fsck is executed is determined by the ExpressCluster version. = fsck is always executed before mounting: - SE3.0-1 to = fsck is executed only when mounting fails - SE3.0-4 or later - XE3.0-1 or later - SX3.1-2 or later = execution timing of fsck can be adjusted - SE3.1-6 or later - XE3.1-6 or later - SX3.1-6 or later 53

54 1.5 Floating IP Resource ExpressCluster versions The following ExpressCluster versions support this function. ExpressCluster Server Configuration Tool Version SE3.0-1 or later, LE3.0-1 or later, XE3.0-1 or later SX3.1-2 or later For supported Versions see a separate guide, Operational Environment, Configuration Tool Operational Environment Dependencies By default, this function does not depend on any group resource type. Group resource type Edition

55 1.5.3 Floating IP Client applications can use floating IP addresses to access cluster servers. By using floating IP addresses, clients can be unaware if accessing servers are switched even at failover and group migration. Floating IP addresses can work on the same LAN and over the remote LAN. Server 1 went down. Failover Floating IP Server 1 Server 2 Floating IP Server 1 Server 2 Accessing the floating IP Accessing the floating IP Client Client (1) Address assignment IP addresses to assign for floating IP addresses should; be in the same network address as the LAN that contains the cluster server, and available host address. Allocate as many IP addresses that meet the above conditions as required (generally as many as failover groups). These IP addresses are the same as general host addresses, therefore, you can assign global IP addresses such as Internet. 55

56 (2) Switching method By ARP broadcasting from the server, MAC addresses on ARP table are switched. The table below shows the information of ARP broadcasting packets ff ff ff ff ff ff MAC address (6 bytes) MAC address (6 bytes) FIP address (4 bytes) MAC address (6 bytes) FIP address (4 byte) (3) Routing You do not need to make settings for the routing table. (4) Conditions to use Floating IP addresses are accessible to the following machines; * Cluster server itself * The other servers in the same cluster, the servers in other clusters. * Clients on the same LAN as the cluster server and on remote LAN. If the following conditions are satisfied, machines other than the above can access floating IP addresses. However, accessibility is not guaranteed for all models or architectures of machines. Test the accessibility carefully by yourself if you use those machines. * TCP/IP is used for the communication protocol. * ARP protocol is supported. Even over LANs configured with switching hubs, floating IP address mechanism works properly. When a server goes down, the TCP/IP connection the server is accessing will be disconnected. 56

57 1.5.4 Floating IP address of each server You can set floating IP addresses for each server in the following versions. ExpressCluster Version Server SE3.1-6 or later, LE3.1-6 or later, XE3.1-6 or later, SX3.1-6 or later Configuration Tool or later You can realize a cluster in a different network segment by setting floating IP addresses for each server. e.g.) when the settings of floating IP resource fip1 are: * setting of the server 1 which is on the segment A ( /24): Floating IP address: /24 * setting of the server 2 which is on the segment B ( /24): Floating IP address for each server : /24 57

58 1.5.5 Notes on floating IP resource (1) Note 1 for IP address overlaps Refer to this note when you use the following versions. ExpressCluster Version Server SE3.1-6 or later, LE3.1-6 or later, XE3.1-6 or later, SX3.1-6 or later Configuration Tool or later C. If the following is set to the floating IP resource, the failover of resources may fail: - when the value smaller than the default is set to Activity Retry Threshold. - when Ping Retry Count and Ping Interval are not set. This problem occurs due to the following causes. 1. Releasing IP address may take time depending on the specification of the ifconfig command after deactivating the floating IP address on the server from which the resource is failed over. 2. On the activation of the floating IP address on the server to which the resource is failed over, if the ping command is executed to the IP address to be activated in order to prevent redundant activation, ping reaches the IP address because of the reason above, and the resource activation abnormality occurs This problem can be prevented by the settings below. - Set a greater value to the Activity Retry Threshold of the resource (default: 5 times). - Set greater values to Ping Retry Count and Ping Interval. 58

59 D. If OS stalls with the floating IP address activated, the resource failover may fail when the following settings are made: - A value other than 0 is set to Ping Timeout - FIP Force Activation is off This problem occurs due to the following causes: A part of OS stalls (as examples below) with the floating IP address activated. - Network modules are running and respond to ping from other nodes - A stall cannot be detected in the user space monitor resource On the activation of the floating IP address on the server to which the resource is failed over, if the ping command is executed to the IP address to be activated in order to prevent redundant activation, ping reaches the IP address because of the reason above, and the resource activation abnormality occurs In the machine environment where this problem often occurs, this can be prevented by the settings below. However, both groups may be activated depending on the status of a stall, and server shutdown may occur depending on the timing of the activation of both groups. For details, see Maintenance. - Specify 0 to Ping Timeout Overlap check is not performed to the floating IP address. - Specify On to FIP Force Activation The floating IP address is activated forcibly even when the address is used on a different server. 59

60 60

61 (2) Note 2 for IP address overlaps Refer to this note when you use the following versions. ExpressCluster Version Server SE3.1-5 or earlier, LE3.1-5 or earlier, XE3.1-5 or earlier, SX3.1-5 or earlier Configuration Tool or earlier E. If the value for [Activity Retry Threshold] of the floating IP resource is small, the resource failover may fail. See (1) A for the cause of this problem. To prevent this problem, the default value for [Activity Retry Threshold] of the floating IP resource is set to five (5). This value is recommended. You should take the above into account when configuring [Activity Retry Threshold]. F. If the OS stalls while the floating IP address is activated, some [Ping timeout] settings of the floating IP address may cause failure of the resource failover. See (1) B for the cause of this problem. The default value of [Ping timeout] runs ping at resource activation. If you set zero (0) to [Ping timeout], ping is not executed against the floating IP address to be activated. For more information, see Configuration Tool. You can prevent this problem by making such settings if the problem frequently occurs in your machine environment. However, both groups may be activated depending on the status of a stall, and server shutdown may occur depending on the timing of the activation of both groups. For details, see a separate guide, Maintenance. 61

62 1.6 Mirror disk Resource ExpressCluster versions The following ExpressCluster versions support this function. ExpressCluster Server Configuration Tool Version LE3.0-1 or later For supported Versions, see a separate guide, Operational Environment, Configuration Tool Operational Environment Dependencies By default, this function depends on the following other group resource in a group. Group resource type Floating IP resource Edition WAN/LAN/LE 62

63 1.6.3 Mirror disk (1) Data mirroring disk Data mirroring disks are a pair of disks that mirror disk data between two servers in a cluster. Disks used by the OS (including disks controlled by /etc/fstab of the OS) cannot be mirrored. Add disks for mirroring. You should give the same disk settings to disks on both servers if the disks are used as mirror disk. * Disk type Give the same disk type to mirror disks on both servers. For disk types whose behaviors have been confirmed, see a separate guide, Operational Environment. Samples: Combination Server 1 Server 2 OK SCSI SCSI OK IDE IDE NG IDE SCSI * Disk geometry Give the same disk geometry to mirror disks on both servers. It is recommended to use the same model of disks on both servers. Samples: Combination Head Sector Cylinder OK NG Server Server Server Server

64 * Disk partition Make settings so that the same partition is accessible with the same device name on both servers. Sample: Adding a SCSI disk to both servers to create a pair of mirroring disks. Server 1 Cluster Partition /dev/sdb1 Data Partition /dev/sdb2 Mirror Partition Device Unit of failover for mirror disk resources Server 2 /dev/sdb1 /dev/sdb2 /dev/sdb The same partition configuration /dev/sdb Sample: Using space area of the IDE disk on which OS of both servers are stored to create a pair of mirroring disks. Server 1 OS Root Partition /dev/hda1 OS Swap Partition /dev/hda2 Cluster Partition /dev/hda3 Data Partition /dev/hda4 Mirror Partition Device Unit of failover for mirror disk resources Server 2 /dev/hda1 /dev/hda2 /dev/hda3 /dev/hda4 The same partition configuration + A mirror partition device is a device which is provided to the higher layer of the OS by the mirroring driver of ExpressCluster. + Failover is executed per mirror partition device. + The ExpressCluster is responsible for mounting and unmouting of mirror partition devices. It is not necessary to enter to the fstab of OS. The users should not directly control the mirror partition device. + /dev/nmpx (where x is a number from 1 through 8) is used for special device names of the mirror partition. A special device driver name should not overlap with any other device name. + The major number of mirror partitions, 218, is used. Do not use any major number that overlaps with other device major numbers. + Secure two partitions, a cluster partition and a data partition, as a pair. 64

65 + A mirror partition (cluster partition, data partition) can be allocated on the same disk as OS (root partition or swap partition). + When the maintainability at a failure occurrence is important - It is recommended to have another disk for a mirror than a disk for OS (a root partition and a swap partition). + When LUN cannot be added because of the hardware RAID specifications When it is difficult to change the LUN configuration since hardware RAID is preinstalled on the disk, a mirror partition (cluster partition, data partition) can be allocated on the same disk as OS (root partition or swap partition). 65

66 * Disk allocation You may use more than one disk for mirror disk. You may allocate multiple mirror partition devices to a single disk. Sample: Adding two SCSI disks to both servers to create two pairs of mirroring disks. Server 1 Server 2 The same disk type. The same geometry. The same device name. /dev/sdb /dev/sdb The same disk type. The same geometry. The same device name. /dev/sdc /dev/sdc + Reserve a cluster partition and a data partition as a pair on a disk. + You may not use a data partition and a cluster partition as first and second disk respectively. Sample: Adding a SCSI disk for both servers to create two mirroring partitions. Server 1 Server 2 The same partition device name. The same partition device name. * For disks, functions such as stripe set, volume set, mirroring, stripe set with parity by Linux md and LVM are not supported. 66

67 (2) Data partition Partitions where ExpressCluster Server stores mirrored data on mirror partitions (such as business application data) are referred to as data partitions. Allocate data partitions as follows; * data partition size The partition size should be a multiple of 4096 bytes, and a multiple of 4 1 for the number of blocks. * Partition ID 83(Linux) * If Perform First mkfs is selected in the cluster configuration information, a file system is automatically created when a cluster is generated. * The access control (mount/umount) of the file system is performed by the ExpressCluster server. No settings of mount/unmount a data should be made on the OS. (3) Cluster partition Partitions dedicated for ExpressCluster Server s mirror partition controlling are referred to as Cluster partition. Allocate cluster partitions as follows; * Cluster partition size 10 MB or more Depending on the geometry, the size may be larger than 10 MB. This is not a problem. * Partition ID 83(Linux) * A cluster partition and data partition for data mirroring should be paired. * You do not need to make the file system on cluster partitions. 1 The default block size is 1024 bytes on Linux. 67

68 (4) Mirror Partition Device One mirror disk resource provides one mirror partition to the higher layer of the OS. If you register a mirror disk resource to the failover group, it can be accessed from only one server (it is generally the primary server of the resource group). Typically, the mirror partition device (dev/nmpx) remains transparent to users (AP) since they perform I/O via a file system. The device name is automatically assigned when the information by Configuration Tool is created. * The access control (mount/umount) of the file system is performed by the ExpressCluster server. No settings of mount/unmount a data should be made on the OS. Mirror partition s (mirror disk resource s) accessibility to business applications is the same as switching partition (disk resources) that uses shared disks. * Mirror partition switching is done for each failover group according to the failover policy. * By storing data required for business applications on mirror partitions, the data can automatically and seamlessly be used after failover/migration of failover group. Server 1 went down. Failover Application A Application A Server 1 Server 2 Server 1 Server 2 Mirror disk Mirror disk Mirror disk Mirror disk A mirroring pair 68

69 1.6.4 Mirror prameters (1) The maximum number of request queues Configure the number of queues for I/O requests from the higher layer of the OS to the mirror disk driver. If you select a larger value, the performance will improve but a large number of the physical memory is required. If you select a smaller value, a small number of the physical memory is used and the performance may be lowered. Note the following when you make the settings of the number of queues. = The improvement in the performance is possible when you set a larger value under the following conditions where: - A large-sized physical memory with enough free space is installed. - The performance of the disk I/O is good. = It is recommended to select a smaller value under the conditions where: - A small-sized physical memory is installed. - I/O of disk performance is not good. - alloc_pages: 0-order allocation failed (gfp=0x20/0) is entered to the system log of the OS. (2) Connection timeout Timeout to wait for a successful connection between servers when mirror recovering or synchronizing data. 69

70 (3) Send timeout This timeout is for: 2) Timeout for the active server to wait for the write data to be fully sent from it to the stand-by server since beginning of the transmission at mirror return or data synchronization. Server 1 Server 2 AP Sending write data the Mirror driver Mirror driver Mirror disk Mirror disk 3) Time interval for checking whether or not to send ACK notifying completion of write from the active server to the stand-by server. Server 1 Server 2 AP Interval to check ACK transmission Mirror driver Mirror driver Mirror disk Mirror disk 70

71 (4) Receiving timeout This timeout is used for: Timeout for the stand-by server to wait for completion of receiving the write data from the active server to it since beginning of the transmission. Server 1 Server 2 AP Receiving the write data Mirror driver Mirror driver Mirror disk Mirror disk (5) Ack timeout (LE3.1-6 or later) This timeout is used for: 4) Timeout for the active server to wait for receiving the ACK notifying the completion of write since the write data is completely sent to the stand-by server. If the ACK is not received within the time specified for the timeout, the bitmap for difference on the active server is accumulated. Server 1 Server 2 AP Receiving ACK Mirror driver Mirror driver Mirror disk Mirror disk 71

72 5) Timeout for the stand-by server to wait for receiving the ACK from the active server since the stand-by server fully sent the ACK notifying completion of write. If the ACK for the active server is not received within the time specified for the timeout, the bitmap for difference on the stand-by server is accumulated. Server 1 Server 2 AP Receiving ACK Mirror driver Mirror driver Mirror disk Mirror disk 6) Timeout for the server sending the return data to wait for receiving the ACK notifying completion of receiving the data from the copy destination server since beginning of the data transmission at mirror returning. Server 1 Server 2 Receiving ACK for the return data Mirror driver Mirror driver Mirror disk Mirror disk (6) Bitmap update interval (LE3.1-6 or later) Time interval for checking the queue of the data to be written into the difference bitmap on the standby server. 72

73 (7) Flush sleep time (LE3.1-1 or later) Set the time to wait for the thread to periodically write the write data that is accumulated in the buffer of the standby system (mirroring destination). * When you choose a large value: + The OS of the standby system (mirroring destination) will be less-heavily loaded. + The write performance will improve. * When you choose a small value: + The OS of the standby system (mirroring destination) will be more-heavily loaded. + The write performance will deteriorate. Note the information described above is a guide for your configuration. The following conditional and environmental factors can affect parameters, which may not give you desired result. It is recommended to use the default value. + The version of the OS + Memory size + File system tuning + The type of the disk interface + The features of the disk or disk interface board (cache size, seek time, and etc.) + The write logic of the application (8) Flush count (LE3.1-6 or later) When the number of write data buffer blocks which are accumulated in the buffer of the standby system (mirroring destination) reaches this value, they are written to the disk. * When you set a greater value: + If the size of write data is small, write performance will deteriorate. + If the size of write data is big, write performance will improve. * When you set a smaller value: + If the size of write data is small, write performance will improve. For example, write performance may improve with the application in which small data is written frequently not by the flush operation of the file system. + If the size of write data is big, write performance will deteriorate. Note the information described above is a guide for your configuration. The following conditional and environmental factors can affect parameters, which may not give you the desired result. It is recommended to use the default value. + Memory size + File system tuning + Write logic of the application 73

74 (9) First Mirror Construction (LE3.1-1 or later) Set whether or not to configure initial mirroring 1 when activating cluster for the first time after the cluster is created. A. Perform First Mirror Construction Just as the operation in the versions up to LE3.1-4, an initial mirroring is configured when activating cluster for the first time after the cluster is created. B. Does not Perform First Mirror Construction Do not configure initial mirroring after constructing a cluster. Before constructing a cluster, it is necessary to make the content of mirror disks identical by a method other than using ExpressCluster. (10) First mkfs (LE3.1-1 or later) Set whether or not to configure initial file creation in the data partition of the mirror disk when activating cluster for the first time after the cluster is created. C. Perform First mkfs Just as the operation in the versions up to LE3.1-4, the first file system is created when activating cluster for the first time immediately after the cluster is created. D. Does not Perform First mkfs Do not create a first file system to the data partition in the mirror disk when activating cluster for the first time immediately after the cluster is created. Select this option when a file system has been set up in the data partition of the mirror disk and has data to be duplicated, which requires no file system creation. The mirror disk partition 2 configuration should fulfill mirror disk resource requirements. You should be cautious when you are clustering a single server. If you select Dose not Perform First Mirror Construction, you cannot choose Perform First mkfs. (That is so because there are differences in the partition images even right after mkfs is performed.) 1 Irrespective of the FastSync Option, the entire data partitions are copied. 2 There must be a cluster partition in a mirror disk. If you cannot reserve a cluster partition when the single server disk is the mirroring target, take a backup and reserve the partition again. 74

75 (11) Mirror agent receiving timeout (LE3.1-6 or later) Timeout for the mirror agent waiting to start receiving data after the mirror agent creates a communication socket with another server Server 1 Server 2 Waiting to start receiving the communication data Mirror Agent Agent 75

76 (12) Example of construction If you are using a disk that has been used as a mirror disk in the past, you must format the disk because the previous data exists in its cluster partition. For the initialization of a cluster partition, see a separate guide, "Maintenance Reuse of Mirror Disks". E. Configure an initial mirroring Create a first file system Set up after installing ExpressCluster Server 1 Server 2 Initial mkfs Mirror disk Initial mkfs Mirror disk Server 1 Server 2 Start configuring the initial mirroring Copy entirely Mirror disk Mirror disk 76

77 F. Configure an initial mirroring Do not create a first file system Server 1 Server 2 Application For partitioning, see 1.5.3(1) Data mirroring disk If you can prepare the application data which will be duplicated before cluster construction, it should be crated in the data partition of the primary mirror disk in advance. (ex. The initial DB of a database) Initial data Mirror disk Mirror disk Set up after installing ExpressCluster Server 1 Server 2 Start configuring the initial mirroring Make an entire copy Mirror disk Mirror disk 77

78 G. Do not configure an initial mirroring Does not create a first file system The following is an example of making the mirror disks of both severs identical. (This cannot be done after constructing the cluster. Be sure to perform this before the cluster construction.) Example 1 Copying partition images of a disk Server 1 Server 2 Application For partitioning, see 1.5.3(1) Data mirroring disk If you can prepare the application data which will be duplicated before cluster construction, it should be crated in the data partition of the primary mirror disk in advance. (ex. The initial DB of a database) Initial data Mirror disk Mirror disk Server 1 Server 2 Remove the mirroring disk of the standby system. Connect it to the server of the primary. To the primary system, connect its mirroring disk and the (removed) disk of the standby system. Initial data Move Mirror disk Mirror disk Server 1 Server 2 Note that copying using a file system does not create the identical partition image. While the mirror disk of the primary system is unmounted, copy the entire partition using a method, such as the dd command, from the data partition of the mirror disk of the primary system to the data partition of the mirror disk of the standby system. Initial data Copy the partition Mirror disk for standby system Mirror disk for primary system image 78

79 Server 1 Server 2 Remove the mirror disk of the standby system that is connected to the primary system and put it back to the standby system. Put it back Mirror disk for standby system Mirror disk for primary system Set up after installing E Cl t Server 1 Server 2 Construct a cluster according to the following settings below: Do not create a first file system Do not configure an initial mirroring No mirroring partition is configured. (i.e. there is no initial synchronization) Initial data Initial data Mirror disk Mirror disk 79

80 Example 2 Copying by a backup device Server 1 Server 2 Application For partitioning, see 1.5.3(1) Data mirroring disk Remove the mirroring disk of the standby system. Connect it to the server of the primary. To the primary system, connect its mirroring disk and the (removed) disk of the standby system. (ex. Initial data of data base etc...) Initial data Mirror disk Mirror disk Server 1 Server 2 Backup device Backup command (such as dd) Use a backup device to make a backup of the mirror partition in the mirror disk of the primary system. Use a backup command in the partition image. Initial data Mirror disk Mirror disk Server 1 Server 2 Move the media Backup device Restore command (such as dd) Backup device Initial data Restore the data, which has been backed up in the primary system, into the data partition of the standby mirroring disk. Use a restoring command in the partition image. Mirror disk Mirror disk 80

81 Server 1 Server 2 No mirroring partition is configured. (i.e. there is no initial synchronization) Initial data Initial data Mirror disk Mirror disk 81

82 1.6.5 Notes on mirror disk resource * Set both servers in the way the identical partitions can be accessed under the identical device name. * The execution timing of fsck varies by versions of ExpressCluster. = fsck is always executed before mounting: - LE3.0-1 to = fsck is executed only when mounting fails: - LE3.1-1 or later 82

83 1.7 RAW Resource RAW resource means resources on raw devices. Raw device is a device on Linux, and directly accesses partition devices without using a file system. Generally, applications build their specific data structures instead of file system ExpressCluster versions The following ExpressCluster versions support this function. ExpressCluster Server Configuration Tool Version SE3.0-4 or later, XE3.1-3 or later, SX3.1-2 or later For supported Versions, see a separate guide, Operational Environment, Configuration Tool Operational Environment Dependencies By default, this function depends on the following group resource types. Group resource type Floating IP resource Edition SAN/SE, XE, SX 83

84 1.7.3 Switching partition * Switching partition is a partition on share disk connected to more than one server which forms a cluster. * Switching is performed according to the failover policy for each failover group. By storing data necessary to applications on switching partitions, it can automatically be passed over at failover or at failover group transfer. * Switching partitions should be accessible to the same area with the same device name on all servers. Server 1 went down! Failover Application A Application A Server 1 Server 2 Server 1 Server 2 Shared disk Shared disk Notes on RAW resource * Make settings so that the same partition is accessible with the same device name. * Stripe set, volume set, mirroring, paritied stripe set functions are not supported for shared disks on Linux with md. * ExpressCluster performs RAW device access control (bind). Do not make settings that the OS binds. * Partitions are read-only on servers where groups are not active. * Do not register RAW devices which have been in Disk I/F List on Server Property, RAW monitor resource or VxVM volume resource. See Section Notes on control by ExpressCluster for details of RAW device of VxVM volume resource. 84

85 1.8 VxVM Related Resource Behavior test information ExpressCluster versions The following ExpressCluster versions support this function. ExpressCluster Server Configuration Tool Version SE3.0-4 or later or later Distribution This function has been tested on the following versions. Distribution Red Hat Enterprise Linux AS release 3 (Taroon) Red Hat Enterprise Linux ES release 3 (Taroon) kernel EL Elsmp VERITAS Volume Manager versions This function has been tested on the following versions. rpm Version Release VRTSvlic VRTSvxvm 3.2 update5_rh3 VRTSvxfs RHEL File system in volume The following file system has been tested. - vxfs 85

86 1.8.2 Dependencies By default, this function depends on the following group resource types. * VxVM disk group resource Group resource type Floating IP resource * VxVM volume resource Group resource type Floating IP resource VxVM disk group resource Edition SAN/SE Edition SAN/SE SAN/SE Resources controlled by ExpressCluster VERITAS Volume Manager Disk Group (hereinafter referred to as Disk Group) is the virtually grouped physical disks. Logical partitions allocated from this Disk Group are referred to as Volume. ExpressCluster can control Disk Group and Volume as VxVM disk group resource and VxVM volume resource respectively. 86

87 1.8.4 VxVM disk group resource (1) About Disk Group + Disk Group is not defined by ExpressCluster. + VxVM disk group resource in ExpressCluster imports/deports Disk Group. + Disk Group is automatically deported at OS startup if it is contained in ExpressCluster configuration data. + A disk group is not supported if it is not contained in ExpressCluster configuration data. 87

88 (2) Commands executed when the VxVM disk group is active Execute the following commands when the VxVM disk group is active. Command Option Timing to use the command import When importing a disk group -t When importing a disk group vxdg -C When importing a disk group failed, and the clear host ID option is ON. -f When importing a disk group failed, and the force activation option is ON. Command Option Timing to use the command -g When starting a specified disk group volume vxrecover -sb When starting a specified disk group volume Sequence when the disk is activated * If the disk group was not deported normally on the server from which the group is failed over, the disk group cannot be imported on the server to which the group is failed over when the clear host ID option is OFF because of VxVM specifications. * In some cases, importing was successful even though the import timeout occurred. You can prevent this problem by executing import retries if the host ID clear or the force import option is set to the import options (SE3.1-6 or later). 88

89 (3) Commands executed when the VxVM disk group is not active Execute the following commands when the VxVM disk group is not active. Command Option Timing to use the command deport When deporting a disk group vxdg flush When flushing a disk group Command Option Timing to use the command -g When stopping the specified disk group vxvol stopall volume When stopping the specified disk group volume Sequence when the disk is not activated 89

90 1.8.5 VxVM volume resource * About Volume + Volume is not defined by ExpressCluster. + VxVM volume resource in ExpressCluster mounts/unmounts the file system on volume. + VxVM volume resource is not required if you use only accessible raw devices (/dev/vx/rdsk/[disk group name]/[volume name]) on the condition where you imported disk groups and volumes are active (i.e., you do raw access without building a file system on the volume). * About dependencies + The default dependency is VxVM volume resource depends on VxVM disk group resource. * fsck execution timing is different depending on the versions of ExpressCluster. = Execute fsck only when mount fails - SE3.0-4 or later = fsck execution timing can be adjusted (For details, see fsck execution timing ). - SE3.1-6 or later 90

91 1.8.6 Notes on control by ExpressCluster (1) Allocate a dedicated LUN for disk heartbeat. Disk heartbeat dedicated LUN LUN to be VxVM If you add disks to Disk Group, add in increments of a physical disk. Disk Group is imported into one server only, not both. Therefore, the partition for disk heartbeats, which should be accessible from both servers, cannot be on the same LUN as the disk to be added in Disk Group. The same LUN Partition for disk heartbeats (On LUN controlled by VxVM) LUN to be VxVM 91

92 (2) Check out the volume RAW device s real RAW device in advance. Import all disk groups which can be activated on one server and make all volumes active before installing ExpressCluster. Run the command below. # raw -qa /dev/raw/raw2: bound to major 199, minor 2 /dev/raw/raw3: bound to major 199, minor 3 (A) (B) Example: Assuming the disk group name and volume name are: + Disk group name: dg1 + Volume name in dg1: vol1, vol2 Run the command below. # ls -l /dev/vx/dsk/dg1/ brw root root 199, 2 May 15 22:13 vol1 brw root root 199, 3 May 15 22:13 vol2 (C) Confirm that major and minor numbers are identical between (B) and (C). Make sure not to use these RAW devices shown on (A) for Disk heartbeat resource, RAW resource, or RAW monitor resource in ExpressCluster. 92

93 1.8.7 Clustering by VERITAS Volume Manager VERITAS Volume Manager configuration VERITAS Volume Manager whose behaviors have been tested on ExpressCluster should be configured as follows; 93

94 A VxVM configuration sample on the previous page is: Disk Group 1 dg1 Physical disk 1 /dev/sdd Physical disk 2 /dev/sdg Volume vol1 *1 Volume device name Volume RAW device name File system vol2 *1 Volume device name Volume RAW device name File system Disk Group 2 dg2 Physical disk 1 /dev/sde Physical disk 2 /dev/sdh Volume vol3 *1 Volume device name Volume RAW device name File system vol4 *1 Volume device name Volume RAW device name File system Disks for rootdg On Server 1 Partition on /dev/sdb On Server 2 Partition on /dev/sdb LUN for disk heartbeat resource Shared disk 1 Partition on /dev/sdc Shared disk 2 Partition on /dev/sdf /dev/vx/dsk/dg1/vol1 /dev/vx/rdsk/dg1/vol1 vxfs /dev/vx/dsk/dg1/vol2 /dev/vx/rdsk/dg1/vol2 vxfs /dev/vx/dsk/dg2/vol3 /dev/vx/rdsk/dg2/vol3 vxfs /dev/vx/dsk/dg2/vol4 /dev/vx/rdsk/dg2/vol4 vxfs *1 Behaviors were tested in an environment where two or more physical disks were registered in Disk Group and volumes were mirrored between shared disk cabinets. 94

95 ExpressCluster environment sample Refer to Configuration Tool for details of resource parameters. VxVM parameters to be specified here are based on the VxVM setting sample in Section VERITAS Volume Manager configuration. Cluster configuration Heartbeat resource 1st server information (Master server) 2nd server information 1st group (For Web Manager) 1st group resource*1 2nd group (For applications) 1st group resource 2nd group resource Parameters Values Cluster name cluster # of servers 2 # of failover groups 3 # monitor resources 8 # of LAN heartbeats 2 # of COM heartbeats 1 # of DISK heartbeats 2 Server name server1 Interconnection IP address (dedicated) Interconnection IP address (backup) Public IP address COM heartbeat device /dev/ttys0 DISK heartbeat device /dev/sdc1 /dev/raw/raw10 /dev/sdf1 /dev/raw/raw11 Server name server2 Interconnection IP address (dedicated) Interconnection IP address (backup) Public IP address COM heartbeat device /dev/ttys0 DISK heartbeat device /dev/sdc1 /dev/raw/raw10 /dev/sdf1 /dev/raw/raw11 Type Failover Group name WebManager Startup server server1 to server2 # of group resources 1 Type floating ip resource Group resource name WebManagerFIP1 IP address Type Failover Group name failover1 Startup server server1 to server2 # of group resources 4 Type floating ip resource Group resource name fip1 IP address Type VxVM disk group resource Group resource name vxdg1 Disk group name dg1 Host ID clear ON Forced import OFF 95

96 Parameters Values 3rd group resource Type VxVM volume resource Group resource name vxvol1 Volume device name /dev/vx/dsk/dg1/vol1 Volume RAW device name /dev/vx/rdsk/dg1/vol1 Mount point /mnt/vol1 File system vxfs 4th group resource Type VxVM volume resource Group resource name vxvol2 Volume device name /dev/vx/dsk/dg1/vol2 Volume RAW device name /dev/vx/rdsk/dg1/vol2 Mount point /mnt/vol2 File system vxfs 3rd group Type Failover (For applications) Group name failover2 Startup server server2 to server1 # of group resources 4 1st group resource Type floating ip resource Group resource name fip2 IP address nd group resource Type VxVM disk group resource Group resource name vxdg2 Disk group name dg2 Host ID clear ON Forced import OFF 3rd group resource Type VxVM volume resource Group resource name vxvol3 Volume device name /dev/vx/dsk/dg2/vol3 Volume RAW device name /dev/vx/rdsk/dg2/vol3 Mount point /mnt/vol3 File system vxfs 4th group resource Type VxVM volume resource Group resource name vxvol4 Volume device name /dev/vx/dsk/dg2/vol4 Volume RAW device name /dev/vx/rdsk/dg2/vol4 Mount point /mnt/vol4 File system vxfs 1st monitor resource Type user mode monitor (to be created by default) Monitor resource name userw 2nd monitor resource Type VxVM daemon monitor (Automatically created at Monitor resource name vxdw VxVM disk group resource addition) 3rd monitor resource Type VxVM volume monitor (monitors vxvol1) Monitor resource name vxvolw1 Device to be monitored /dev/vx/rdsk/dg1/vol1 VxVM volume resource vxvol1 Actions at error detection Stop Cluster Daemon And OS Shutdown 4th monitor resource Type VxVM volume monitor (monitors vxvol2) Monitor resource name vxvolw2 Device to be monitored /dev/vx/rdsk/dg1/vol2 VxVM volume resource vxvol2 Actions at error detection Stop Cluster Daemon And OS Shutdown 5th monitor resource Type VxVM volume monitor (monitors vxvol3) Monitor resource name vxvolw3 96

97 Parameters Values Device to be monitored /dev/vx/rdsk/dg2/vol3 VxVM volume resource vxvol3 Actions at error detection Stop Cluster Daemon And OS Shutdown 6th monitor resource Type VxVM volume monitor (monitors vxvol4) Monitor resource name vxvolw4 Device to be monitored /dev/vx/rdsk/dg2/vol4 VxVM volume resource vxvol4 Actions at error detection Stop Cluster Daemon And OS Shutdown 7th monitor resource Type raw monitor (monitors rootdg) Monitor resource name raww1 RAW device name to be /dev/raw/raw20 monitored Device name /dev/sdb Actions at error detection Stop Cluster Daemon And OS Shutdown 8th monitor resource Type ip monitor Monitor resource name ipw1 Monitor IP address (Gateway) Actions at error detection WebManager group failover = *1: Prepare a floating IP address for connecting from Web Manager. Add it to the dedicated group. Unless Web Manager dedicated group does not stop, you can access it from Web browser without being aware of the server s real IP. * Make correct settings for monitor object VxVM volume resource and its volume RAW device in VxVM volume monitor resource. * Monitor rootdg by RAW monitor resource. * VxVM daemon resource monitors VxVM s vxconfigd daemon. This is automatically added when you make settings for 1st VxVM disk group resource. * Make sure that RAW devices to be set for the following resources do not collide each other. + RAW device of disk hearbeat resouce + Real RAW device of volume RAW device of VxVM volume resource + RAW device of RAW resource + RAW device to be monitored of RAW monitor resource 97

98 This clustering configuration is illustrated below: 98

99 Overview of clustering steps You do the followings for clustering. 7) Setup of VERITAS Volume Manager You set up VERITAS Volume Manager on the server. 8) Check up of volume RAW device You confirm the Volume RAW device s real RAW device. For details, see Section Notes on control by ExpressCluster. 9) Setup of Configuration Tool You set up Configuration Tool. 10) Setup of ExpressCluster Server You set up ExpressCluster Server on all servers to form a cluster. 11) Making cluster configuration data You create cluster configuration data by Configuration Tool, and save it in a floppy disk. See Section How to make a cluster configuration data for details. 12) Hand-carrying the floppy disk You insert the floppy disk created by Configuration Tool in the master server. 13) Running a cluster generation command Run a cluster generation command on the server where the floppy disk is inserted. 14) Rebooting the server You reboot servers to form a cluster. 15) Accessing ExpressCluster Web Manager You access ExpressCluster Server with your browser. 99

100 How to make a cluster configuration data Steps to make cluster configuration data are like: Start Enter the cluster name. # of server definitions? [OK] Add a server. [Not yet] Enter server priorities. Enter heartbeat priorities. [Add a group resource. # of group definitions 1? [Not yet] Add a group. [OK] # of monitor resource definitions? [Not yet] Add a monitor resource. [OK] See (7) to (11). Finish 1 # of group resource definitions? [Not yet] Add a group resource. [OK] See (2) to (5). 100

16) Start Configuration Tool. Add a server, heartbeat resource, and group. See a separate guide, Cluster Installation and Configuration Guide (Shared Disk) for how to add these.

101 16) Start Configuration Tool. Add a server, heartbeat resource, and group. See a separate guide, Cluster Installation and Configuration Guide (Shared Disk) for how to add these. The tree view is like: 17) Enter the 1st group resource information. Type floating ip resource Group resource name fip1 IP address See a separate guide, Cluster Installation and Configuration Guide (Shared Disk) for details. 18) Select failover1 on the tree view. Select [Edit Add] from the menu bar. Enter the 2nd group resource information. Type Group resource name Disk group name VxVM disk group resource vxdg1 dg1 101

102 A. On the dialog box below, enter the type and group resource name, and click [Next]. B. On the dialog box below, enter the disk group name, and click [Next]. 102

103 C. Click [Next] on the dialog box below. D. Click [Complete] on the dialog box below. 103

104 19) Select failover1 on the tree view. Select [Edit Add] from the menu bar. Enter the 3rd group resource information. Type Group resource name Volume device name Volume RAW device name Mount point File system VxVM volume resource vxvol1 /dev/vx/dsk/dg1/vol1 /dev/vx/rdsk/dg1/vol1 /mnt/vol1 vxfs A. On the dialog box below, enter the type and group resource name, and click [Next]. 104

105 B. On the dialog box below, enter the volume device name, volume RAW device name, mount point and file system, and click [Next]. C. On the dialog box below, click [Next]. 105

106 D. On the dialog box below, click [Complete]. 106

107 20) Select failover1 on the tree view. Select [Edit Add] from the menu bar. Enter the 4th group resource information. Type Group resource name Volume device name Volume RAW device name Mount point File system VxVM volume resource vxvol2 /dev/vx/dsk/dg1/vol2 /dev/vx/rdsk/dg1/vol2 /mnt/vol2 vxfs A. On the dialog box below, enter the type and group resource name, and click [Next]. 107

108 B. On the dialog box below, enter the volume device name, volume RAW device name, mount point and file system, and click [Next]. C. On the dialog box below, click [Next]. 108

109 D. On the dialog box below, click [Complete]. The table view for failover1 is like: 109

110 21) As you did for the 1st group, add resources for the 2nd group. The table view for failover2 is like: 110

111 22) Select Monitors on the tree view. Select [Edit Add] from the menu bar. Enter the 3rd monitor resource information. The 1st monitor resource (user space monitor) has been created by default when the cluster name was defined. The 2nd monitor resource (VxVM daemon monitor) has been created automatically when VxVM disk group resource was added. Type Monitor resource name Device to be monitored VxVM volume resource Actions at error detection VxVM volume monitor vxvolw1 /dev/vx/rdsk/dg1/vol1 vxvol1 Stop Cluster Daemon And OS Shutdown A. On the dialog box below, enter the type and monitor resource name, and click [Next]. 111

112 B. On the dialog box below, enter the device to be monitored, and click [Browse]. On the dialog box below, select vxvol1, and click [OK]. 112

113 C. Confirm that vxvol1 has been set for VxVM volume resource, and click [Next]. D. On the dialog box below, specify how the system should act at error detection, and click [Browse]. 113

114 On the dialog box below, select cluster, and click [OK]. E. Confirm that cluster is now a recovery object, and select Stop Cluster Daemon And OS Shutdown for the final action. Click [Complete]. 114

115 23) Make settings for the following monitor resources as you did in Step 22). 4th monitor resource Type Monitor resource name Device to be monitored VxVM volume resource Actions at error detection 5 th monitor resource Type Monitor resource name Device to be monitored VxVM volume resource Actions at error detection 6 th monitor resource Type Monitor resource name Device to be monitored VxVM volume resource Actions at error detection VxVM volume monitor vxvolw2 /dev/vx/rdsk/dg1/vol2 vxvol2 Stop Cluster Daemon And OS Shutdown VxVM volume monitor vxvolw3 /dev/vx/rdsk/dg2/vol3 vxvol3 Stop Cluster Daemon And OS Shutdown VxVM volume monitor vxvolw4 /dev/vx/rdsk/dg2/vol4 vxvol4 Stop Cluster Daemon And OS Shutdown 24) Select Monitors on the tree view. Select [Edit Add] from the menu bar. Enter the 7th monitor resource information. Type Monitor resource name RAW device name to be monitored Device name Actions at error detection raw monitor raww1 /dev/raw/raw20 /dev/sdb Stop Cluster Daemon And OS Shutdown 115

116 A. On the dialog box below, enter the type and monitor resource name, and click [Next]. B. On the dialog box below, enter the RAW device name to be monitored and device name, and click [Next]. 116

117 C. On the dialog box below, specify how the system should act at error detection, and click [Browse]. On the dialog box below, select cluster, and click [OK]. 117

118 D. Confirm that cluster is now a recovery object, and select Stop Cluster Daemon And OS Shutdown for the final action. Click [Complete]. 25) Enter the 8th monitor resource information. Type ip monitor Monitor resource name ipw1 Monitored IP address (Gateway) Actions at error detection Failover of WebManager group See a separate guide, Cluster Installation and Configuration Guide (Shared Disk) for details. 118

119 26) The table view for Monitors is like: Now, the cluster configuration data has been created successfully. See a separate guide, Cluster Installation and Configuration Guide (Shared Disk) for the next steps. 119

120 1.9 NAS Resource ExpressCluster versions The following ExpressCluster versions support this function. ExpressCluster Server Configuration Tool Version SE3.1-1 or later, LE3.1-1 or later, XE3.1-3 or later, SX3.1-2 or later For supported Versions, see a separate guide, Operational Environment, Configuration Tool Operational Environment Dependencies By default, this function depends on the following group resource type. Group resource type Floating IP resource Edition SAN/SE, WAN/LAN/LE, XE, SX NAS * The NAS resource controls the resources in the NFS server. * By storing the data that is necessary for business transactions in the NFS server, it is automatically passed on when the failover group is moving during failover. Server1 Down Failover Application A Application A Server1 Server2 Server1 Server2 NFS Server NFS Server 120

121 1.9.4 Notes on NAS resource * The ExpressCluster server will control the access (mount/umount) to the file system. Do not make settings of mount/umount on the OS. * On the NFS server, you need to make settings that allow servers in the cluster for access to NFS resources. * On the ExpressCluster server, make settings that start the postmap service. * If you specify the host name as the NAS server name, make the settings for name resolving. * If you duplicate the GFS server feature on the XE version, see a separate guide, GFS HOWTO, which you can download from the ExpressCluster website. 121

122 1.10 MONITOR RESOURCE Currently supported monitor resource are: Monitor resource name Abbreviation Functional overview Disk monitor resource diskw See 1.12Disk Monitor Resource RAW monitor resource raww See 1.13 RAW Monitor ResourceIP Monitor Resource. IP monitor resource ipw See 1.14 IP Monitor Resource. NIC Link Up/Down monitor See 1.15NIC Link Up/Down Monitor miiw resource Resource Mirror disk connect monitor See 1.16 Mirror Disk Connect Monitor mdnw resource Resource. Mirror disk monitor resource mdw See 1.17 Mirror Disk Monitor Resource. PID monitor resource pidw See 1.18 PID Monitor Resource. User space monitor resource userw See 1.19 User Space Monitor Resource. VxVM daemon monitor resource vxdg See 1.20 VxVM daemon Monitor Resource. VxVM volume monitor resource vxvol See 1.21 VxVM volume monitor resource. Multi-target Monitoring Resource mtw See 1.22 Multi-target Monitoring Resource. 122

123 1.11 Monitor Resource Monitor Resource monitors specified targets. If an abnormality is detected in a target, Monitor Resource restarts/failover the group resource. The following monitor resources can be monitored. Monitorable statues are divided into two groups. + Always monitored (From the cluster startup to the cluster stop) = Disk monitor resource = IP monitor resource = User space monitor resource = Mirror disk monitor resource = Mirror disk connect monitor resource = RAW monitor resource = VxVM daemon monitor resource = NIC Link Up/Down monitor resource = Multi-target monitor resource + Monitored when active (From the group activation to the group deactivation) = Pid monitor resource = VxVM volume monitor resource Always monitoring Monitoring Monitoring when active Monitoring Cluster startup Group activation Group deactivation Stop cluster 123

124 Monitor timing The range of this function varies by the versions of ExpressCluster. ExpressCluster Server Configuration Tool Version SE3.0-1 to 3.0-4, LE3.0-1 to 3.0-4, XE3.0-1 For supported Versions, see a separate guide, Operational Environment ", "Configuration Tool Operational Environment. Monitor resource Monitor timing Selection range of target resource Disk monitor resource Fixed always - IP monitor resource Fixed always - User space monitor resource Fixed always - Mirror disk monitor resource 1 Fixed always - Mirror disk connect monitor resource1 Fixed always - RAW monitor resource 2 Fixed always - VxVM daemon monitor resource 3 Fixed always - Pid monitor resource Fixed at activation exec VxVM volume monitor resource3 Fixed at activation vxvol With the exception of some monitor resources, selection of monitor timing is available for the version or later. ExpressCluster Version server SE3.1-1 or later, LE3.1-1 or later, XE3.13 or later, SX3.1-2 or later (Version3.1-6 or later supports multi-target monitor resource) Configuration Tool For supported Versions, see a separate guide, Operational Environment", "Configuration Tool Operational Environment. Monitor resource Monitoring timing Selection range of target resource Disk monitor resource Select from always or at activation All available IP monitor resource Select from always or at activation All available User space monitor resource Fixed always - Mirror disk monitor resource1 Fixed always - Mirror disk connect monitor resource1 Fixed at all times - RAW monitor resource2 Select from always or at activation All available VxVM daemon monitor resource3 Select from always or at activation All available NIC Link Up/Down monitor resource Select from always or at activation All available Pid monitor resource Fixed at activation exec VxVM volume monitor resource3 Fixed at activation vxvol Multi-target monitor resource Select from always or at activation All available 1 For WAN/LAN/LE only 2 For SAN/SE and WAN/LAN/LE only 3 For SAN/SE only 124

125 Monitor interval All monitor resources except the user space monitor resource are polled at every monitor interval. The following describes the flow of polling performed to the monitor resource of normal and abnormal cases of the monitor interval setting in time series. * When the monitoring is normal 125

126 * Detection of monitoring abnormality (without monitor retry setting) After an occurrence of a monitoring abnormality, the abnormality is detected at the next polling and the recovery operation to the recovery target is started. 126

127 * Detection of a monitoring abnormality (with monitor retry setting) After an occurrence of a monitoring abnormality, if the abnormality is detected at the next polling and it is not recovered by the monitor retries, the recovery operation to the recovery target is started. 127

128 * Detection of a monitoring timeout (without monitor retry setting) Immediately after an occurrence of a monitoring timeout, the recovery operation to the recovery target is started. 128

129 * Detection of a monitoring timeout detection (with monitor retry setting) After an occurrence of a monitoring timeout, the monitor retry is performed and the recovery operation to the recovery target is started. 129

130 Abnormality detection If an abnormality is detected, the following countermeasures are taken. However, the following recovery operation will not be performed if the recovery object is non-activity. + Reactivation is tried if an abnormality is detected in a resource being monitored. + Failover is tried if reactivation retries fail as many times as the reactivation threshold. + The final action is taken if an abnormality is detected even after failovers are performed as many times as the failover threshold. If the recovery object is in the following status, no recovery action is taken. Recovery object Status Re-activation 1 Failover 2 Final action 3 Group resource/ Already stopped No No No failover group Being activated/stopped No No No Already activated Yes Yes Yes Failed to activate/stop Yes Yes Yes Cluster Yes If the group resource (disk resource, exec resource...) is set as the recovery target at the setting when a failure of monitor resource is detected and monitor resource detected a failure during the recovery operation transition (reactivation -> failover -> final action), do not perform the following command, and control the cluster and group from Web manager. + termination/suspension of cluster + start/terminate/migrate group If you perform the control written above during the recovery operation transmission due to a monitor resource failure, the group resource of other group may not terminate. In addition, if after final action has been done, the control written above may be performed even when the monitor resource is in a failure status. If the monitor resource recovers from abnormality (normal), the reactivation count, failover count, and whether or not to execute the final action are reset. N t th t f l ti i l t d i t 1 Effective only when the value for the activation threshold is one or greater. 2 Effective only when the value for the failover threshold is one or greater. 3 Effective only when the option other than No Operation is selected. 130

131 The following pages describe the monitor resource work flow at abnormality detection The following pages describe what will be performed if an abnormality is detected only in one server while the gateway is specified as an IP resource of the IP monitor resource. Because the IP monitor resource 1 is running normally in Server 2, the operation can be continued by failover of the failover group A. 131

132 The following pages describe what will be performed if an abnormality is detected in both of the servers while the gateway is specified as an IP resource of the IP monitor resource. 132

133 133

134 Gateway Public LAN (Also used for interconnect) Server1 Monitor resource IP monitor resource 1 Interconnect LAN Server2 Monitor resource IP monitor resource 1 If the abnormality continues, reactivation for Failover Group A is retried up to 3 times. Failover Group A Disk resource 1 exec resource 1 Floating IP Resource 1 Server1 : IP Monitor Resource 1 Reactivation count 3 times Failover count Once No recovery action takes place because the failover group A is already stopped Server 2: IP monitor resource 1 Reactivation count 3 times Failover count zero Retry threshold of Disk Monitor Resource 1 monitoring exceeded on Server 1 again Gateway Public LAN (Also used for interconnect) Server1 Monitor resource IP monitor resource 1 Interconnect LAN Server2 Monitor resource IP monitor resource 1 Starts failover Failover group A in Server 2. This is first time failover in Server 2. Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Server 1: IP Monitor Resource 1 Reactivation count 3 times Failover count Once No recover action takes place because the failover group A is already stopped. Server 2: IP monitor resource 1 Reactivation count 3 times Failover count zero Failover Failover Group A from Server 2 to Server 1 134

135 Gateway Public LAN (Also used for interconnect) Server1 Monitor resource IP monitor resource 1 Interconnect LAN Server2 Monitor resource IP monitor resource 1 Abnormality detected in monitoring IP Monitor resource 1 has been continued in Server 1. Failover Group A Disk resource 1 exec resource 1 Floating IP resource Server 1: IP Monitor Resource1 Reactivation count 3 times Failover count Once Server 2: IP monitor resource 1 Reactivation count 3 times Failover count Once Gateway Public LAN (Also used for interconnect) Server1 Monitor resource IP monitor resource 1 Interconnect LAN Server2 Monitor resource IP monitor resource 1 Monitoring of the IP Monitor resource 1 is retried up to 3 times in Server 1.. Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Server 1: IP Monitor Resource 1 Reactivation count 3 times Failover count Once Server 2: IP monitor resource 1 Reactivation count 3 times Failover count Once When retry threshold for monitoring Disk Monitor resource 1 exceeds again in Server 1 No reactivation is tried because reactivation threshold is already 3. No failover is tried because failover threshold is already 1. Starts final action. Gateway Public LAN (Also used for interconnect) Monitor resource Failover Group A Server1 IP monitor resource 1 Interconnect LAN Server2 Monitor resource IP monitor resource 1 Starts final action for IP Monitor Resource 1 in Server 1. Final action is taken if failover retry has been failed as many times as the threshold. Disk resource 1 exec resource 1 Supplement: When the monitor resource recovers from abnormal to normal on the monitored server, the conter for the reactivation and failover are reset to zero (0), and starts recover actions next time abnormality is detected. 135

136 These sample work flows assume the health of interconnect LAN. When all interconnect LANs are disconnected, internal communications are disabled among servers. Therefore, even if an abnormality is detected on a server being monitored, the group failover process fails. A way to fail over the group when all interconnect LANs are disconnected is to shut down the server where the abnormality is detected. If the server is shut down, other servers can detect it and start the group s failover. The following pages describe, based on the sample settings below, the abnormality detection work flow on the condition that all interconnect LANs are disconnected. The recovery object reactivation process is the same as when the interconnect LANs are healthy. Starting from the failover process on Server 1 which requires interconnect LAN, the work flow is explained below. Server 1 Disk Monitor Resource 1 Reactivation count 3 times Failover count zero Reactivation threshold exceeded Public LAN (working also as Interconnect) Server 1 Interconnect Server 2 Monitor resource LAN Monitor resource Disk monitor resource 1 Failover Group A Disk resource 1 Exec resource 1 Floating IP resource 1 Disk monitor resource 1 Disk Heartbeat Started failover of Failover Group A. However, it failed because servers could not communicate each other due to the Interconnect LAN is disconnection. On each server, failover can be retried up to Failover threshold. 1st failover on Server 1 Shared disk Server 1 Disk Monitor Resource 1 Reactivation count 3 times Failover count Once Failover threshold exceeded on Server 1 136

137 Public LAN (Also used for interconnect) Reactivation is tried for Group A on Server 2 as done on Server 1. On Server 2 also, failover is tried if reactivation fails for Group A. However, because there is no server that the failover can move to, no failover can successful complete. If failover fails as many times as the threthhold, the final action is taken on Server 2 as done on Server 1. Went down Public LAN (working also as Interconnect) Failover Group A Disk resource 1 exec resource 1 Floating IP address 1 Went down Went down 137

138 Returning from monitor error (normal) When return of the monitor resource is detected during or after recovery actions following detection of monitoring abnormality, the counts of the thresholds shown below are reset. + Reactivation Threshold + Failover Threshold Whether or not to execute the final action is reset (execution required.) The following pages describe what will be performed from the point when the final action as described in Abnormality detection. Abnormality Detection is executed to the point another monitoring error occurs after monitoring returning to normal. 138

139 Server 1: IP Monitor Resource 1 Reactivation count 3 times Failover count Once After all the recover actions are done, abnormality in monitoring continues. Gateway Public LAN (Also working as Interconnect ) Server1 Monitor resource IP monitor resource 1 Interconnect LAN Server2 Monitor resource IP monitor resource 1 The final action for the IP monitor resource 1 has been taken in Server 1. Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Server 1: IP Monitor Resource 1 Reactivation count 3 times Failover count Zero Server 1: IP Monitor Resource 1 Reactivation count Zero Failover count Zero When recovery is confirmed within monitoring retry threshold Gateway Public LAN (Also working as Interconnect ) Server1 Monitor resource IP monitor resource 1 Interconnect LAN Server2 Monitor resource IP monitor resource 1 Normality is detected in monitoring the IP monitor resource 1 (whether or not the IP address is active is checked every interval) Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Server 1, 2: IP Monitor Resource 1 Reactivation count Zero Failover count Zero When abnormality is detected in the monitoring again. The number of reactivations and the number of failovers are reset because return of the monitor resource is detected. 139

140 Gateway Public LAN (Interconnect ) Server1 Monitor resource IP monitor resource 1 Interconnect LAN Server2 Monitor resource IP monitor resource 1 Reactivation of the failover group A begins in Server 1. Re-activation Threshold is the number of reactivations on a server basis. First reactivation in Server 1 Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Server 1: IP Monitor Resource1 Reactivation count 3 times Failover count Zero Server 2: IP monitor resource 1 Reactivation count Zero Failover count Zero No recovery action takes place because the failover group A is "already stopped," When reactivation count exceeds the threshold on Server 1, a failover for Failover group A is performed. If the monitor error persisted, reactivation of the failover group A would not take place and failover to Server 2 would be performed. However, because return of the monitor resource was detected and reactivation count has been reset, another reactivation takes place. 140

141 Activation/Deactivation abnormality of recovery object while performing recovery actions If the monitoring target of the monitor resource is the device used for the group resource of the recovery object, activation/deactivation abnormality of the recover target may be detected during recovery when a monitoring error is detected. The following pages describe recovery actions to be taken when the same device is specified as a monitor target of the disk monitor resource and the disk group of the failover group A. 141

142 The reactivation threshold of the monitor resource and the activation retry threshold of the group resource are not mentioned in the following figures because they are set to zero (0). Server1 Monitor resource Disk Monitor Resource 1 Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Server2 Monitor resource Disk Monitor Resource 1 Activation of the disk monitor resource 1 and the failover group A is started in Server 1 and 2. -ioctl of TUR is executed for the device at Server 1: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Zero Shared disk Server 2: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Zero Monitor resource Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Monitor error occurred Shared disk Server2 Monitor resource Disk Monitor Resource 1 Monitor error occurred An abnormality is detected in monitoring of the disk monitor resource 1 in Server 1 and 2. (ioctl of TUR failed.) Server 1: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Zero Server 2: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Zero Abnormalities may be detected in deactivation of the disk resource, depending on the location of the disk device failure. 142

143 Server1 Monitor resource Disk Monitor Resource 1 Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Server2 Abnormalities may be detected in deactivation of Shared disk disk resource, depending on the location of the disk device failure. Monitor resource Disk Monitor Resource 1 Failover of the failover group A due to detection of an abnormality in monitoring the disk monitor resource 1 begins in Server 1. - 擢 ailover thresholdof the monitor resource is the number of failovers on a server basis. - The first failover in Server 1. Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Zero Server 2: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Zero the Server1 Monitor resource Disk Monitor Resource 1 Server2 Monitor resource Disk Monitor Resource 1 Activating the disk resource 1 due to failover has failed in Server 2. (such as failure of fsck, mount etc.) Monitor error occurred Shared disk Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Monitor error occurred Activation failed Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Zero Server 2: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Zero 143

144 No recovery action is performed in Server 2, where abnormality of the disk monitor resource 1 is Server 2. detected as Server1 Server2 Server1, because Monitor resource Disk Monitor Monitor resource Disk Monitor Resource 1 Resource 1 recovery object, failover Failover Group A group A, is Activation activated. For more information on recovery actions a monitor resource performs against the recovery object, see Abnormality detection. Shared disk Disk resource 1 exec resource 1 Floating IP resource 1 failed Failover of the failover group A due to abnormality in activation of the disk resource 1 begins in - 擢 ailover thresholdof the group resource is the number of failovers on a server basis. - The first failover in Server 2 Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Zero Server 2: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Once the that Server1 Server2 Activating the disk resource 1 due to failover has failed in Server 1. (such as failure of fsck, mount etc.) Activation failed Monitor resource Disk Monitor Resource 1 Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Shared disk Monitor resource Disk Monitor Resource 1 Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Zero Server 2: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Once Abnormalities may be detected in deactivation of the disk resource, depending on the location of the disk device failure. 144

145 Abnormalities may be detected in deactivation of the disk resource, depending on the location of the disk device failure. Server1 Monitor resource Disk Monitor Resource 1 Server2 Monitor resource Disk Monitor Resource 1 Failover of the failover group A due to abnormality in activation of the disk resource 1 begins in Server 1. - Failover thresholdof the group resource is the number of failovers on a server basis. - The first failover in Server 1 Activation failed Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once Shared disk Server 2: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Once Server1 Monitor resource Disk Monitor Resource 1 Server2 Monitor resource Disk Monitor Resource 1 Activating the disk resource 1 due to failover has failed in Server 2. (such as failure of fsck, mount etc.) Monitor error occurred Shared disk Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Monitor error occurred Activation failed Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once Server 2: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Once 145

146 Server1 Server2 Activating the disk resource 1 Monitor resource Monitor resource The Disk Monitor Disk Monitor due to failover has failed in final action is Resource 1 Resource 1 Server 2. (such as failure of fsck, mount etc.) executed in Server 2 Failover Group A because Activation the number of failovers due to Disk resource 1 failed abnormalities of exec resource 1 disk Floating IP Server 1: resource resource 1 - Disk Monitor Resource 1 activation has Failover count Once exceeded the Monitor error Monitor error - Disk Resource 1 threshold. occurred occurred Failover count Once Shared disk However, note Server 2: that activation ends abnormally without activating the - Disk Monitor Resource 1 rest of the group resources in the failover group A Failover count Zero because No Operation (Next Resources Are Not - Disk Resource 1 Activated) is selected as a final action. Failover count Once Abnormalities may be detected in deactivation of the disk resource, depending on the location of the disk device failure. Server1 Monitor resource Disk Monitor Resource 1 Monitor error occurred Shared disk Server2 Monitor resource Disk Monitor Resource 1 Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Monitor error occurred The final action due to the activation abnormality of the disk resource 1 is performed in Server 2. (such as failure of fsck, mount etc.) - フイルオバグルプAは活性異常状態 Activation failed Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once Server 2: - Disk Monitor Resource 1 Failover count Zero - Disk Resource 1 Failover count Once 146

147 The final action is executed in Server 1, as well as in Server 2, because the number of failovers due to Server1 Monitor resource Disk Monitor Resource 1 Monitor error occurred Shared disk Server2 Monitor resource Disk Monitor Resource 1 Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Monitor error occurred abnormalities of disk resource activation has exceeded the threshold. However, note that activation ends abnormally without activating the rest of the group resources in failover group A because No Operation (Next Resources Are Not Activated) is selected as a Activation failed Failover of the failover group A due to detection of the abnormality in monitoring the disk monitor resource 1 begins in Server 2. - 擢 ailover thresholdof the monitor resource is the number of failovers on a server basis. The first failover in Server 2 Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once Server 2: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once final action. the Server1 Server2 Activating the disk resource 1 due to failover has failed in Server 1. (such as failure of fsck, mount etc.) Activation failed Monitor resource Disk Monitor Resource 1 Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Monitor error occurred Shared disk Monitor resource Disk Monitor Resource 1 Monitor error occurred Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once Server 2: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once Abnormalities may be detected in deactivation of the disk resource, depending on the location of the disk device failure. 147

148 The final action is executed in Server 1 because number of failovers due to abnormalities detected in Server1 Server2 monitoring Monitor resource Monitor resource disk Disk Monitor Disk Monitor monitor Resource 1 Resource 1 resource 1 Activation failed exceeded threshold. Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Monitor error occurred Shared disk Monitor error occurred The final action for the failover group A (stop group) due to detection of the abnormality in monitoring the disk monitor resource 1 begins in Server 1. Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once Server 2: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once the the has the Nothing will happen even if an abnormality Server1 is detected Monitor resource Disk Monitor Resource 1 Failover Group A Server2 Monitor resource The final action for the failover group A (stop group) due to detection of the abnormality in monitoring the disk monitor resource 1 begins in Server 1. Disk Monitor Resource 1 Server 1: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once in monitoring disk monitor Monitor error occurred Shared disk Monitor error occurred Server 2: - Disk Monitor Resource 1 Failover count Once - Disk Resource 1 Failover count Once resource 1 after the failover group A is stopped due to the final action executed for the disk monitor resource 1 in Server 1. However, note that the final action for the disk monitor resource 1 is executed in Server 2 if the failover group A is manually activated because the final action for the disk monitor resource is not executed in it yet. the 148

149 Delay warning ExpressCluster Server Configuration Tool Version SE3.1-1 or later, LE3.1-1 or later, XE3.1-3 or later, SX3.1-2 or later For supported Versions, see a separate guide, Operational Environment, Configuration Tool Operational Environment. When the server is heavily loaded because of enterprise application configuration, a monitoring resource may detect a monitoring timeout. You can make settings that allow issue of an alert at the time when polling time (the time actually passed) reaches a certain percentages of the monitoring time before a time out is detected. The following figure shows timeline until a delay warning of the monitoring resource is issued. In this example, the monitoring timeout is set to 60 seconds and the delay warning rate is set to 80%, which is the default value. Start M onitoring after cluster starts or Restart m onitoring Monitor resource P olling 0 10 Tim eout Delay W arning Time A B C Monitor resource P olling NormalR ange of P olling Time Delay W arning R ange of P olling Time A. The polling time of monitoring is 10 seconds. The monitoring resource is in normal status. No alert is issued. B. The polling time of monitoring is 50 seconds. During the time, the delay of monitoring is detected and the monitoring resource is in the normal status. An alert is issued because the rate of the delay warning has exceeded 80%. C. The polling time o monitoring is longer than 60 seconds. During the time, the delay of monitoring is detected and the monitoring resource is in the abnormal status. No alert is issued. 149

150 Setting 0 or 100 to the delay warning rate allows you the following. When 0 is set to the delay monitoring rate An alert for the delay warning is issued for every monitoring. By using this feature, you can calculate the polling time for the monitor resource when the server is heavily loaded, which will allow you to determine the time for monitoring timeout of a monitoring resource. When 100 is set to the delay monitoring rate The delay warning will not be issued. Similarly for the heartbeat resources, the delay warning for the heartbeat resources is alert reported. Be sure not to set a low value, such as 0%, except for a test operation. 150

151 Waiting for the start of monitoring ExpressCluster Server Configuration Tool Version SE3.1-1 or later, LE3.1-1 or later, XE3.1-3 or later, SX3.1-2 or later For supported Versions, see a separate guide, Operational Environment, Configuration Tool Operational Environment. Waiting for the start of monitoring refers to a feature which starts monitoring after the time period specified to want for monitoring has elapsed. The following shows how monitoring differs when the time to wait for the start of monitoring is set to 0 second and 30 seconds. Start M onitoring after the cluster starts or Restart m onitoring Monitor resource Polling 0 30 Timeout 60 Time Start M onitor W ait T ime in c ase o f 0 se c 0 30 Monitor resource Polling 120 Time Start M onitor W ait T ime in case of 30 sec Start M onitor W ait T ime Monitor resource Polling Start M onitor W ait T im e - R ange of S tarting Polling Time The monitoring will start after the specified start monitoring wait time has elapsed even when the monitoring resources are suspended and resumed using the monitoring control commands. 151

152 The start monitoring wait time is used when there is possibility for termination right after the start of monitoring due to the setting mistake of applications, such as the exec resource which is monitored by the PID monitor resource, and other errors and the errors cannot be recovered by reactivation. For an example, the recovery action may be repeated endlessly if 0 is set to the start monitoring wait time as shown below. Start activating Online Pending exec1 Online Online/Offline Pending Offline Online/Offline Pending Start Application aborts Start monitor Start Application aborts Start Start Application Request Restart Request Restart Monitor resource Polling Monitor resource Interval Polling Monitor Stopped Monitor resource Polling Interval Monitor resource Polling Monitor Stopped Time PID monitor Normal monitoring Detect Abnormality, and restart Normal monitoring Detect Abnormality, and restart Interval (Online/Offline Pending) Monitor resource Polling Start Monitor Wait Time Polling Time Range of Stopping Monitor The recovery action is repeated endlessly because the initial monitoring resource polling finishes successfully. The current number of times the recovery action of the monitoring resource is performed is reset when the status of the monitoring resource becomes normal. This is why reactivation of recovery action is endlessly performed since the current number of times is always reset to

153 You can prevent such problems by setting the start monitoring wait time. For the start monitoring wait time, 60 seconds is set as the time after which the application may end after its start. Start activating group Online Online Online/Offline Pending Offline exec1 Pending Failover to next priority server Start Application aborts Start monitor Application aborts Application Request Restart Request Failover 0 Waiting for Start Waiting for Start Monitor resource Monitor resource of Monitoring Polling Monitor of Monitoring Polling Monitor 60 Stopped 0 60 Stopped Time PID Monitor Not detect abnormality, because the monitoring is starting Detect Abnormality, and restart exec1 Not detect abnormality, because the monitoring is starting Detect Abnormality, and Interval (Online/Offline Pending) Monitor resource Polling Start Monitor Wait Polling Time Range of Stopping Monitor If the application is abnormally terminated in the server to which the group failovers, the group termination takes place as a final action 153

154 Limit of the number of reboots If [Stop Cluster Daemon And OS Shutdown] or [Stop Cluster Daemon and OS Reboot] is selected as a final action to be taken when an abnormality is detected by the monitor resource, you can limit the number of shutdowns or reboots caused by detection of abnormalities by the monitor resource. Because the number of reboots is recorded on a server basis, the maximum reboot count is the upper limit of reboot count on a server basis. The number of reboots as a final action to be taken in detection of abnormality in group activation or deactivation and the number of reboots as a final action to be taken in detection of abnormality by a monitor resource are recorded separately. If the time to reset the maximum reboot count is set to zero (0), the The following pages describe what will happen if the number of reboots is limited in the settings shown below. As a final action, [Stop Cluster Daemon and OS Reboot].is performed once because the maximum reboot count is set to one (1). When the monitor resource detects no abnormality for 10 minutes after a reboot following the cluster shutdown, the number of reboots is reset because the time to reset maximum reboot count is set to 10 minutes. 154

155 Monitor resource Disk Monitor Resource 1 Server 1 Server 2 Monitor resource Disk Monitor Resource 1 Starts activating Disk Monitor Resource (Perform I/O to the devices and others at each interval) Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Maximum startup count Once Reboot count Zero Shared disk Maximum startup count Once Reboot count Zero Server 1 Monitor resource Disk Monitor Resource 1 Server2 Monitor resource Disk Monitor Resource 1 Abnormality is detected in monitoring Disk Monitor Resource 1. (e.g., ioctl / read errors) Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Monitor error occurred Maximum startup count Once Reboot count Zero Shared disk Maximum startup count Once Reboot count Zero Server 1 Monitor resource Disk Monitor Resource 1 Cluster deamon Failover Group A stopped. Reboot the OS. Disk resource 1 exec resource 1 Floating IP resource 1 Server2 Monitor resource Disk Monitor Resource 1 Reboot the operating system after stopping the cluster deamon. The final action is performed because 殿 ctivation retry thresholdand 吐 ailover thresholdis zero (0) Reboot count becomes One. Maximum startup count Once Reboot count Once Shared disk Maximum startup count Once Reboot count Zero 155

156 Server 1 Monitor resource Disk Monitor Resource 1 Cluster deamon フェイルオーバ stopped. グループA Reboot the OS 1Disk resource exec resource 1 Floating IP resource 1 Server2 Monitor resource Disk Monitor Resource 1 Starts a failover for Failover Group A. 熱 aximum reboot countis the maximum value of the reboot count for each server. On Server 2, the reboot count is zero (0). Maximum startup count Once Reboot count Once Shared disk Maximum startup count Once Reboot count Zero Server 1 Monitor resource Disk Monitor Resource 1 Server2 Monitor resource Disk Monitor Resource 1 Rebooting Server 1 is completed. Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Maximum startup count Once Reboot count Once Shared disk Maximum startup count Once Reboot count Zero Server 1 Monitor resource Disk Monitor Resource 1 Server2 Monitor resource Disk Monitor Resource 1 Transfer Failover Group A to Server 1 by using the clpgrg command or Web Manager. Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Maximum startup count Once Reboot count Once Shared disk Maximum startup count Once Reboot count Zero 156

157 Server 1 Monitor resource Disk Monitor Resource 1 Server2 Monitor resource Disk Monitor Resource 1 Abnormality is detected in monitoring Disk Monitor Resource 1. (e.g., ioctl, read error) Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Monitor error occurred Shared disk Maximum startup count Once Reboot count Once Maximum startup count Once Reboot count Zero Server 1 Monitor resource Disk Monitor Resource 1 Server2 Monitor resource Disk Monitor Resource 1 No final action is performed on Server 1 because maximum startup count is already met. Failover Group A Disk resource 1 exec resource 1 Floating IP resource 1 Reboot count is not reset even after 10 minutes passed. Maximum startup count Once Reboot count Once Shared disk Maximum startup count Once Reboot count Zero Remove abnormality in the disk. Perform cluster shut down and reboot by using the clpstdn command and Web Manager. Failover Group A Server 1 Monitor resource Disk Monitor Resource 1 Server2 Monitor resource Disk Monitor Resource 1 When 10 minutes passed after status of Disk Monitor Resource 1 on Server 1 changed to normal, reboot count is reset. The final action is taken next time abnormality is detected in Disk Monitor Resource 1. Maximum startup count Once Reboot count Zero Shared disk Maximum startup count Once Reboot count Zero 157

158 Monitor priority To assign a higher priority to the monitor resource monitoring when the operating system is under high load, you can set the nice value to all monitor resources except the user space monitor resource. The nice values can be between 19 (low priority) and -20 (high priority). A detection of the monitor timeout can be suppressed by setting a higher priority to the nice value. 158

159 1.12 Disk Monitor Resource This function monitors disk devices. It is recommended to use the RAW monitor for monitoring a shared disk for which the disk monitor resource (TUR) cannot be used ExpressCluster versions The following ExpressCluster versions support this function. ExpressCluster Method Version Server TUR SE3.0-1 or later, LE3.0-1 or later, XE3.0-1 or later SX3.1-2 or later Server TUR(legacy) SE3.1-6 or later, LE3.1-6 or later, XE3.1-6 or later SX3.1-6 or later Server TUR(generic) SE3.1-6 or later, LE3.1-6 or later, XE3.1-6 or later SX3.1-6 or later Server Dummy Read SE3.0-1 or later, LE3.0-1 or later, XE3.0-1 or later SX3.1-2 or later Configuration Tool For supported Versions, see a separate guide, Operational Environment, Configuration Tool Operational Environment. 159

160 Method There are mainly two monitoring methods of disk monitor resource: Read and TUR. * Notes common to TUR + Test Unit Ready command can not be run on disks or disk interfaces (HBA) which does not support it. Even if hardware supports this command, its driver may not support it. Confirm the driver specifications, too. + Compared to Dummy Read method, loads on the OS and disks are not heavy. + In some cases, Test Unit Ready may not be able to detect actual I/O errors to media. The following three methods can be selected for TUR monitoring. * TUR + For ExpressCluster Version or earlier Monitoring is performed by using ioctrl (Test Unit Ready). Test Unit Ready (TUR) which is defined as SCSI command is issued to the specified device, and the status of the device is determined by the result. + For ExpressCluster Version or later ioctl is issued by the following steps, and the status of the device is determined by the result: Issue the ioctl (SG_GET_VERSION_NUM) command. The status is determined by the return value of ioctl and the version of SG driver. - If the ioctl command succeeded and the version of SG driver is 3.0 or later, execute ioctl TUR (SG_IO) by using SG driver. - If the ioctl command failed or the version of SG driver is earlier than 3.0, execute ioctl TUR which is defined as SCSI command. * TUR (legacy) + For ExpressCluster Version or later Monitoring is performed by using ioctrl (Test Unit Ready). Test Unit Ready (TUR) which is defined as the SCSI command is issued to the specified device, and the status of the device is determined by the result. * TUR (generic) + For ExpressCluster Version or later Monitoring is performed by using ioctl TUR (SG_IO). ioctl TUR (SG_IO) which is defined as the SCSI command is issued to the specified device, and the status of the device is determined by the result. 160

161 The following is the method of READ monitoring. * Dummy Read + This method reads data of the specified size on the specified device (disk device or partition device). Based on the result (the size of actually read data), an abnormality is examined. + This method examines if the specified size of data can be read. Validity of read data is not examined. + The larger the size to read, the heavier the loads on the OS and disks. + Note the followings in (3) about the size of data to be read. + It is recommended to use RAW monitor resource if a raw device is available. 161

162 I/O size Enter the size of data for Dummy Read if Dummy Read is selected in Method. = Depending on shared disks and interfaces in your environment, the size of actually installed cache for reading varies. = Therefore, if the specified Dummy Read size is too small, Dummy Read may hit in cache, i.e. may not be able to detect I/O errors. = When you specify a Dummy Read size, confirm that by intentionally causing I/O errors, Dummy Read can detect I/O errors on shared disks with that size. Cache Cache on RAID subsystem Cache on each disk drive Server 痴 interface adapter such as SCSI and Fibre Array disk internal drive Note: the above figure illustrates a general image of shared disks. This is not always applicable to any array unit. 162

163 1.13 RAW Monitor Resource RAW monitor is similar to the disk monitor resource (Dummy Read method), however, it reads raw devices. Because of no buffering by the OS, error detection is assured in a relatively short time. If the disk monitor resource (TUR method) is not available on your shared disks, RAW monitor is recommended ExpressCluster versions The following ExpressCluster versions support this product. ExpressCluster Server Configuration Tool Version SE3.0-4 or later, LE3.0-4 or later, XE3.1-3 or later, SX3.1-2 or later For supported Versions, see a separate guide, Operational Environment, Configuration Tool Operational Environment Notes on RAW Monitor Resource * For Linux Kernel 2.4 When RAW Monitor Resource is set, partitions cannot be monitored if they have been or will be possibly mounted. Set the monitor object device name to whole device (device indicating the entire disks). * For Linux Kernel 2.6 When RAW Monitor Resource is set, Partitions cannot be monitored if they have been mounted. Partitions also cannot be monitored even if you set device name to whole device. (device indicating the entire disks) Prepare partition for monitoring and set the monitor object RAW device name to it. (Allocate the capacity of more than 10Mbyte for monitoring partitions). * Do not register RAW devices which have been in Disk I/F List on Server Property, RAW monitor resource or VxVM volume resource. See Section VxVM volume resource1.8.6 Notes on control by ExpressCluster for details of RAW device of VxVM volume resource. 163

1.13.3 RAW Monitor Resource setting samples --For Linux Kernel 2.

164 RAW Monitor Resource setting samples --For Linux Kernel (1) Setting sample of Disk resource and RAW monitor + Disk resource + RAW monitor (monitors build-in hard disk drives on both servers) + RAW monitor (monitors shared disks) Set /dev/sda to RAW monitor. /dev/sda Set /dev/sda to RAW monitor. /dev/sda Set /dev/sdb to RAW monitor. /dev/sdb1 /dev/sdb2 Set /dev/sdb1 to Disk HB. Set /dev/sdb2 to Disk resource. Do not specify a partition device if it is mounted as the file system. It is recommended to specify a device name which indicates the entire disks. Do not specify a partition device if it is mounted as the file system. It is recommended to specify a device name which indicates the entire disks. 164

ExpressCluster for Linux Version 3 Web Manager Reference. Revision 6us

ExpressCluster for Linux Version 3 Web Manager Reference Revision 6us EXPRESSCLUSTER is a registered trademark of NEC Corporation. Linux is a trademark or registered trademark of Linus Torvalds in the