Oracle Data Guard for RAC migrations WLCG Service Reliability Workshop CERN, November 30 th, 2007 Jacek Wojcieszuk, CERN IT LCG
Outline Problem description Possible approaches Oracle Data Guard Migration Procedure Possible Variations Summary RAC MIgration WLCG Service Reliability Workshop, Nov 2007-2
Problem Description More and more data centers run Oracle databases on commodity hardware relying on: Software solutions for high availability (RAC, ASM) Hardware redundancy Using commodity hardware may impose relatively frequent hardware changes due to: Short hardware lifetime Short support period Replacing database hardware without significantly compromising service availability, becomes a challenge as database systems grow larger and larger. RAC MIgration WLCG Service Reliability Workshop, Nov 2007-3
Possible approaches copy Copy over the database with OS tools Procedure Setup the new system (hardware and software) Stop the database Copy over datafiles, redo logs and control files and the spfile Open the database on the new hardware Advantages: Simple concept Does not require knowing any extra tools (scp is enough) Disadvantages Difficult if ASM or RAW devices in use Imposes long database downtime scp datafiles, control files, redo logs spfile RAC MIgration WLCG Service Reliability Workshop, Nov 2007-4
Possible approaches export/import p Export/Import of the whole database Procedure Setup new system Lock the original database for users Copy over the data using exp/imp or expdp/impdp programs Advantages: Simplicity Disadvantages: Long database downtime proportional to the database size Requires a lot of testing (export/import sometimes throw unexpected errors) export import import over DB link RAC MIgration WLCG Service Reliability Workshop, Nov 2007-5
Possible approaches - RMAN Backup and Recovery with RMAN Procedure: Setup the new system Stop and backup the old database Duplicate/recover the database to the new hardware Open the database with resetlogs Advantages: Faster then export/import approach Old RMAN backups stay valid Simple and reliable Disadvantages: Long downtime proportional to the database size backup restore/ duplication RAC MIgration WLCG Service Reliability Workshop, Nov 2007-6
Possible approaches incremental replacement Iterative hardware replacement Procedure: Remove a piece of old hardware from the cluster Add a piece of new hardware to replace removed one Repeat till all the hardware gets replaced Advantages: In theory no downtime No extra space needed in the computing center Disadvantages Complicated and error-prone Labor intensive Requires a lot of communication between DBAs, Sysadmins and technicians RAC MIgration WLCG Service Reliability Workshop, Nov 2007-7
Possible approaches Data Guard Data Guard Procedure Install new system Configure it as a standby database with Oracle Data Guard Perform switchover Redirect all users to the new system Advantages Very short downtime which lenght does not depend on the database size Disadvantages At first glance the procedure seems to be more complicated than at least some of procedures described before 1. Setup dataguard 2. Perform switchover RAC MIgration WLCG Service Reliability Workshop, Nov 2007-8
Oracle Data Guard Widely used and mature feature of Oracle database software Available since version 8i Previously known as Standby Server Helps to create and keep synchronized 1 or more standby databases Well integrated with other HA features of Oracle software Supports 2 types of standby database Physical Logical RAC MIgration WLCG Service Reliability Workshop, Nov 2007-9
Physical Standby Database Transa actions Redo Transport Redo Stream Redo Apply Primary Database Physical Standby Database RAC MIgration WLCG Service Reliability Workshop, Nov 2007-10
Migration Procedure step 1 New system: Hardware assembly (architecture must be the same as in case of the old system e.g. X86_64) OS installation Clusterware installation RAC software installation: Version must match the version on the old system Use of clonning procedure highly recommended Listener configuration Preferebly using netca tool Shared storage configuration Using the same configuration as on the source system, although not mandatory, simplifies the migration If you plan to use ASM configure and start ASM instances RAC MIgration WLCG Service Reliability Workshop, Nov 2007-11
Migration Procedure - step 2 New system: Configuration of naming methods on all nodes of the new cluster There should be an entry pointing to the old system OLD_ DB = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = oldnode1-vip)(port = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = oldnode2-vip)(port = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl.cern.ch) ) ) Creation of password files Configuration of backup At least one node should have access to the backups of the database being migrated RAC MIgration WLCG Service Reliability Workshop, Nov 2007-12
Migration Procedure step 3 On the old system: Enabling forced logging To ensure that all data changes will go to redo logs SQL> alter database force logging; Performing a full backup (or at least level 1 backup) RMAN> backup database; Performing control file backup for standby RMAN> backup current controlfile for standby; Defining a TNS entry pointing to the new system NEW_DB = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = newnode1-vip)(port = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = newnode2-vip)(port = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl.cern.ch) ) ) RAC MIgration WLCG Service Reliability Workshop, Nov 2007-13
Migration Procedure step 4 New system: Preparation of parameter file: The easiest is to reuse parameter file from the old system Parameters to be added: log_archive_dest_2, standby_file_management, fal_server, fal_client, service_names Parameters to be modified: db_ recovery_ file_ dest, db_recovery_file_dest_size, db_create_file_dest, dump destinations Parameters to be removed: control_files, autotuned memory allocation parameters Creation of the spfile *.log_archive_dest_2='service=old_db valid_for=(online_logfiles,primar y_ role)' *.standby_file_management=auto *.fal_server='old_db' *.fal_client='new_db' *.service_ names='orcl.cern.ch' # *.db_file_name_convert=... # *.log_file_name_convert=... # Modify other parameters if needed: # *.db_recovery_file_dest=... # *.db_recovery_file_dest_size=... # *.db_create_file_dest=... # *.background_core_dump=... core dump= #... # Delete control_files parameter # Delete shared memory allocation parameters (parameters with names starting with double underscore) RAC MIgration WLCG Service Reliability Workshop, Nov 2007-14
Migration Procedure step 5 New System Database duplication using RMAN rman target SYS@OLD_DB auxiliary / nocatalog RMAN> startup nomount RMAN> DUPLICATE TARGET DATABASE FOR STANDBY; Enabling redo apply SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION; Update of the cluster registry: Defining database and DB instance targets Defining dependencies between ASM and DB instances Defining custom services srvctl add database -d orcl -o $ORACLE_HOME srvctl add instance -d orcl -i orcl1 -n newnode1 srvctl modify instance -d orcl -i orcl1 -s +ASM1 srvctl add instance -d orcl -i orcl2 -n newnode2 srvctl modify instance -d orcl -i orcl2 -s +ASM2 srvctl add service d orcl s orcl_loadbalanced r orcl1,orcl2 P BASIC srvctl add service d orcl s orcl_noloadbalanced r orcl1 a orcl2 P BASIC RAC MIgration WLCG Service Reliability Workshop, Nov 2007-15
Migration Procedure step 6 Old System: Starting the synchronization: SQL> alter system set log_archive_dest_2='service=new_db valid_for=(online_logfiles,primary_role)' scope=both sid='*'; SQL> alter system set standby_file_management=auto scope=both sid='*'; Switch over to the standby role: All services should be stopped All DB instances but one should be stopped SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN; SQL> SHUTDOWN IMMEDIATE SQL> STARTUP MOUNT RAC MIgration WLCG Service Reliability Workshop, Nov 2007-16
Migration Procedure step 7 On the new system: Switch over to the primary role: SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY; SQL> ALTER DATABASE OPEN; Startup other database instances and services Redirect user to the new system RAC MIgration WLCG Service Reliability Workshop, Nov 2007-17
Possible Variations Described procedure can be easily customized to allow performing extra actions: OS version change Migration to a bigger/smaller cluster Changer of the storage management layer With an extra intermediate step the procedure can be also used to migrate the database from 32 to 64 bit platform 1. Setup DataGuard 3. Stop database 4. Start database in migrate mode and run migration script 32bit 32bit 2. Perform switchover 32bit 64bit 64bit RAC MIgration WLCG Service Reliability Workshop, Nov 2007-18
Summary The Data Guard based migration procedure has been used this year at CERN: we migrated all production and validation databases ~15 systems in total we moved from RHEL 3 to RHEL 4 at the same time we also enlarged all production clusters downtime associated with the migration did not exceed 1 hour per database RAC MIgration WLCG Service Reliability Workshop, Nov 2007-19