Huawei OceanStor Software Issue 01 Date 2015-01-17 HUAWEI TECHNOLOGIES CO., LTD.
2015. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Huawei Technologies Co., Ltd. Address: Website: Huawei Industrial Base Bantian, Longgang Shenzhen 518129 People's Republic of China http://e.huawei.com i
Contents Contents 1 Overview... 1 2 OceanStor Architecture... 2 2.1 Architecture... 2 3 Networking and Environment Requirements... 4 3.1 Network Overview... 4 3.2 Disaster Recovery Data Center Solution (Active-Passive Mode)... 4 3.3 Recovery Data Center Solution (Geo-Redundant Mode)... 6 3.4 HA Solution... 7 3.5 Disaster Recovery Data Center Solution (Active-Active Mode)... 9 3.6 Local Protection Solution... 11 3.7 System Running Environment...12 4 Software Functions... 13 4.1 Application Awareness...13 4.1.1 Background...13 4.1.2 Technical Principles...13 4.1.3 Technical Highlights...13 4.1.4 Customer Benefits...14 4.2 Application Consistency...14 4.2.1 Background...14 4.2.2 Technical Principles...14 4.2.3 Technical Highlights...15 4.2.4 Customer Benefits...15 4.3 Automatic Grouping...15 4.3.1 Background...15 4.3.2 Technical Principles...15 4.3.3 Technical Highlights...16 4.3.4 Customer Benefits...16 4.4 One-Key Switchover and Test...16 4.4.1 Background...16 4.4.2 Technical Principles...16 4.4.3 Technical Highlights...16 4.4.4 Customer Benefits...16 ii
Contents 4.5 Intuitive DR...17 4.5.1 Background...17 4.5.2 Technical Principles...17 4.5.3 Technical Highlights...17 4.5.4 Customer Benefits...17 4.6 Northbound Integration...17 4.6.1 Background...17 4.6.2 Technical Principles...17 4.6.3 Technical Highlights...18 4.6.4 Customer Benefits...18 5 Acronyms and Abbreviations... 19 iii
1 Overview 1 Overview HUAWEI OceanStor is a management software suite designed to protect host applications and help recover data when disasters occur. is compatible with Windows, Linux, and AIX. On these platforms, works with value-added functions of Huawei disk arrays to protect data consistency and applications and implement disaster recovery (DR) for applications. Currently, DR solutions in the industry are developed based on different synchronous or asynchronous replication technologies. These DR solutions involve diversified network elements (NEs) such as service applications, host servers, virtual gateways, production devices, DR devices, and network links. The protection and recovery mechanisms are complex, resulting in long recovery time. Besides, global topologies of DR solutions are not intuitively displayed and the status of involved NEs cannot be monitored in time, complicating DR management and increasing DR costs. HUAWEI OceanStor is a DR management software suite incorporating data consistency protection, snapshot, and remote replication technologies. Specifically designed for Huawei's typical DR solutions, provides the following robust capabilities to reduce your management complexity and cost: Intuitive and wizard-based platform: making your operation and monitoring easy and convenient Application awareness: including automatic application identification, application data consistency protection, and automatic application switchover Simple management: including intuitive topology, flexible policy-driven protection, one-click data recovery, and all-around DR solution monitoring DR testing: including recoverability verification and one-click testing 1
2 OceanStor Architecture 2 OceanStor Architecture 2.1 Architecture Figure 2-1 architecture employs a B/S architecture, that is, its DR management can be completed by using a browser. It contains two subsystems: RD Agent and RD Server. 2
2 OceanStor Architecture RD Agent: installed on a service host and used to discover hosts and applications, protect the consistency of applications and data, and recover applications. RD Server: installed on an independent server and used to configure and schedule the entire RD management system The communication protocols between these subsystems are as follows: Between RD Servers/Between an RD Server and a disk array: REST over HTTPS Between an RD Server and an RD Agent: SOAP over HTTPs Between an RD Agent and a disk array: iscsi or Fibre Channel 3
3 Networking and Environment Requirements 3 Networking and Environment Requirements 3.1 Network Overview This section details the DR solution networks supported by. 3.2 Disaster Recovery Data Center Solution (Active-Passive Mode) Based on synchronous and asynchronous replication technologies provided by Huawei's innovative disk arrays or virtual intelligent (VIS) products, implements: End-to-end management of DR resources, including service host applications, VIS, service devices, and DR devices Management of the production center and DR center Management of replication between service devices and DR devices (or VIS) DR tests (switchover of services from the production center to the DR center) and DR management 4
3 Networking and Environment Requirements Figure 3-1 Network diagram for Huawei point-to-point DR solution As shown in the following figure, Server is installed on the host deployed at the DR center. Agent is installed on the service hosts at the production center and on the DR host at the DR center. Server and Agent communicate with each other using SOAP. Server and devices communicate with each other using TLV or SMI-S. Figure 3-2 Deployment of Huawei point-to-point DR solution 5
3 Networking and Environment Requirements 3.3 Recovery Data Center Solution (Geo-Redundant Mode) Employing Huawei's synchronous and asynchronous replication technologies, the geo-redundant DR solution covers remoter DR centers and boasts a higher DR capability and wider DR scope compared with the point-to-point DR solution. In the geo-redundant DR solution, implements: End-to-end management of DR resources, including service host applications, VIS, service devices, and DR devices at the same-city and remote DR centers Management of the production center, intra-city DR center, and remote DR center Management of replication between service devices and DR devices at the intra-city and remote DR centers DR tests (switchover of services from the production center to the intra-city or remote DR center) and DR management The geo-redundant DR solution uses either cascading or parallel networking mode. In the cascading networking mode, the DR impact on the services at the production center is little; whereas service switchback from the remote DR center is complex. In the parallel networking mode, the DR impact on the services at the production center is obvious; whereas service switchback from the remote DR center is the same as that in the point-to-point DR solution and is simple. Figure 3-3 Network diagram for Huawei geo-redundant DR solution As shown in the following figure, both in the cascading and parallel networking modes, Server is installed on the hosts at DR centers 1 and 2. 6
3 Networking and Environment Requirements Agent is installed on the service hosts at the production center and on the DR hosts at the DR centers. Server and Agent communicate with each other using SOAP. Server and devices communicate with each other using TLV or SMI-S. Figure 3-4 Deployment of Huawei geo-redundant DR solution (cascading mode) Production center DR center 1 DR center 2 Agent Agent Agent Service host Server DR host Server DR host Service ReplicationDire ctor host (B) DR ReplicationDirec tor host (C) DR Figure 3-5 Deployment of Huawei geo-redundant DR solution (parallel mode) DR center 1 Production center Server ReplicationDire ctor host (B) Agent DR host Agent Service host DR Service DR center 2 ReplicationDirect or Server ReplicationDir ector host (C) Agent DR host DR 3.4 HA Solution Based on the Huawei self-developed VIS mirroring technology, the high availability (HA) solution ensures the high availability of devices at the local or intra-city data center. In the HA solution, : Implements end-to-end management and monitoring of DR resources, including service host applications, VIS, service devices, and DR devices. Manages DR sites. 7
3 Networking and Environment Requirements Displays topologies of the solution. Figure 3-6 Network diagram for Huawei HA solution Data center VIS cluster Replication Director Mirroring Data center DR center RD VIS cluster Remote mirroring As shown in the following figure, Server is installed on the hosts deployed at the production center or DR center. Agent is installed on the service host at the production center. Server and 8
3 Networking and Environment Requirements Agent communicate with each other using SOAP. Server and devices communicate with each other using TLV or SMI-S. Figure 3-7 Deployment of Huawei HA solution Production center Agent Service host Server ReplicationDirect host (A) VIS Service DR Production center DR center Agent Service host Server host (A) VIS Service DR 3.5 Disaster Recovery Data Center Solution (Active-Active Mode) Based on cluster technologies and Huawei self-developed VIS mirroring technology, the active-active DR solution achieves active-active DR for service applications, VIS engines, and devices at two production centers. Business continuity is not interrupted upon the failure of a single component or production center. Compared with replication-based active and standby DR solutions, the active-active DR solution has the following advantages: Minimum RPO and RTO: The active-active DR solution delivers zero RPO, achieving zero data loss. The RTO ranges from seconds to zero, subject to different cluster technologies. Improved device utilization: In the active-active DR solution, the two production centers run services; whereas in an active and standby DR solution, the devices at the DR center remain idle at the most time. In the active-active DR solution, : Implements end-to-end management and monitoring of service host applications, VIS, service devices, and DR devices. 9
3 Networking and Environment Requirements Manages DR sites. Displays topologies of the solution. Figure 3-8 Network diagram for Huawei active-active DR solution Production center 1 Production center 2 RD Synchronous data mirroring Production As shown in the following figure, Server is installed on the host deployed at production center 1 or 2. Agent is installed on the service hosts at the two production centers. Server and Agent communicate with each other using SOAP. Server and devices communicate with each other using TLV or SMI-S. Figure 3-9 Deployment of Huawei active-active solution Production center 1 Production center 2 Agent Service host Server ReplicationDirec tor host Agent Service host VIS VIS Service Service 10
3 Networking and Environment Requirements 3.6 Local Protection Solution Based on Huawei's snapshot technologies, the local protection solution provides local protection and fast recovery for applications such as Oracle, SQL Server, DB2, Exchange, VMware, and FusionSphere. In this way, the application data lost at a certain point in time caused by misoperations or viruses can be restored by in a fast manner. Figure 3-10 Network diagram for Huawei local protection solution Data center RD Huawei primary Application LUN Snapshot As shown in the following figure, Server is installed on the hosts deployed at the production center. Agent is installed on the service host. Server and Agent communicate with each other using SOAP. Server and devices communicate with each other using TLV or SMI-S. Figure 3-11 Deployment of Huawei local protection solution Production center RD Agent Service host RD Server RD host Service 11
3 Networking and Environment Requirements 3.7 System Running Environment Item Requirement Remarks Server Operating system CPU SUSE Linux Enterprise Server 11 SP1 64-bit Windows Server 2008 R2 Enterprise Edition 64-bit Minimum configuration: 2 x Xeon dual-core 1.6 GHz CPU Standard configuration: 2 x Xeon dual-core 4.0 GHz CPU Memory Minimum configuration: DDR 4 GB memory Standard configuration: DDR 8 GB memory If is installed on virtual machines, do not use dynamic memory allocation. Free disk space Database Management network bandwidth Minimum configuration: Greater than or equal to 10 GB Standard configuration: Greater than or equal to 100 GB GaussDB V100R003C10 Minimum configuration: 2 Mbit/s Standard configuration: 10 Mbit/s Management network refers to the network between the production site and the DR site. Agent Memory Standard configuration: 1 GB DDR This is the minimum requirement on service hosts. Free disk space Standard configuration: 50 MB This is the minimum requirement on service hosts. 12
4 Software Functions 4 Software Functions 4.1 Application Awareness 4.1.1 Background One service system is usually comprised of multiple application systems, each of which has its own database. How to fast and exactly identify applications and databases is critical to improve efficiency and reduce complexity for DR applications. 4.1.2 Technical Principles Agent, an important element of, provides an application detection module for every type of applications, such as Oracle, SQL Server, DB2, and Exchange. These application detection modules sense the different characteristics of applications, and automatically know what applications are running on the service host. The principle details are as follows: Agent and Server are respectively installed on a service host and an independent server. You can access Server by using the browser, and then input the IP address of the service host and your Agent user name and password to submit a discovery request. After Server receives the discovery request, it establishes secure SOAP communication with Agent, and then sends an application awareness request to Agent. After Agent receives the request, it initiates the application detection program and then sends the application awareness result to Server. 4.1.3 Technical Highlights Secure communication: Server and Agent communicate with each other by using an HTTPS-based SOAP protocol, effectively avoiding information leakage caused by illegitimate data interception. Triggered detection: Agent does not start application detection immediately upon its installation on a service host, and its detection process is triggered by server. In this way, the consumption of service host resources is minimized. 13
4 Software Functions On-demand detection: You can customize the applications to be detected. Among the diversified application adaption modules provided by Server, you can only enable the necessary modules for application detection. 4.1.4 Customer Benefits The application awareness function automatically senses the applications running in the customer DR environment, enabling users to adopt optimal DR protection measures. 4.2 Application Consistency 4.2.1 Background No matter the data protection software is based on replication or snapshot, constant data availability and fast data recovery when disasters occur or backup data is in need are the most important concerns. Huawei can provide proper application consistency solutions for diversified applications. 4.2.2 Technical Principles SQL Server and Exchange Huawei provides policy-driven protection for different applications. On the server, a protection group is created for each type of applications and a specific policy is set for the protection group. Then the application data will be snapshot or replicated automatically based on the policy. During the automatic protection process, Server generates consistency snapshot copies or consistency replication copies for application data. The working mechanism is as follows: When a snapshot or replication period arrives, Server sends a notification to Agent, which then implements fitting consistency policies for different applications, and sets these applications to the consistent state. After the applications are in the consistent state, Server asks the disk array to start snapshot or replication, and then the disk array generates snapshot or replication copies for application consistency. After the snapshot or replication copies are generated, Server asks Agent to restore applications from the consistent state to their original states. The following sections detail the diversified application consistency solutions that provides for different applications. When a snapshot or replication period arrives, Server notifies Agent, which then calls a Windows VSS service (such as SQL Server or Exchange VSS Writer). The VSS service flushes dirty data in the SQL Server or Exchange memory to disks, and suspends the write I/Os of the SQL Server or Exchange and sets the SQL Server or Exchange in the consistent state. After the SQL Server or Exchange is in the consistent state, Server asks the disk array to start snapshot or replication, and then the disk array generates snapshot or replication copies for the SQL Server or Exchange. After these copies are generated, Server immediately asks Agent to restores SQL Server or Exchange from the consistent state to their original states. The whole process takes a maximum of 10 seconds. 14
4 Software Functions Oracle Oracle application consistency is implemented based on the Oracle hot backup mechanism. When a snapshot or replication period arrives, Server first asks Agent to set the Oracle database to the hot backup mode, and then asks the disk array to start snapshot or replication. After the snapshot or replication is completed, Server asks Agent to end the Oracle database hot backup mode. DB2 When a snapshot or replication period arrives, Server first asks Agent to suspend the write I/Os of the DB2 database, and then asks the disk array to start snapshot or replication. After the snapshot or replication is completed, Server asks Agent to restore the write I/Os to DB2. VMware When a replication period arrives, Server first asks vcenter to generate silent snapshots for virtual machines (when silent snapshots are generated, virtual machines are suspended for a short time and data in the virtual machine memory is flushed to disks), and then asks the disk array to start replication. After the replication is completed, Server asks vcenter to delete the silent snapshots of virtual machines. 4.2.3 Technical Highlights Deep integration with applications: Applications' resident mechanisms are triggered to achieve consistency. Automatic timeout: Its adverse impact on services is minimized. If Agent does not receive Server's request to restore the applications' state due to certain reasons, such as network faults or request timeout, it will wait 60 seconds, and then automatically restores the applications from the consistent state to their original states. 4.2.4 Customer Benefits The consistency copies generated by snapshots or replication are constantly available and can be used for fast data recovery. 4.3 Automatic Grouping 4.3.1 Background In a virtualized environment, one virtual machine usually spans multiple disk arrays, and one disk array is also shared by multiple virtual machines. The minimum unit of a replication-based data protection technology is LUN. Therefore, a technology is in need to fast and accurately protect the virtual machines running on one or a group of LUNs. 4.3.2 Technical Principles When discovers vcenter or FusionSphere, it groups data as follows: 15
4 Software Functions If a virtual machine spans multiple disk arrays, these arrays are automatically combined to a group. If LUNs of a consistency group belong to multiple disk arrays, these arrays are automatically combined to a group. 4.3.3 Technical Highlights Disk arrays are automatically combined to groups based on their relationship with virtual machines. When the virtual machines change or the capacities of the disk arrays are expanded, the groups are accordingly updated. The whole grouping and update processes are automatic. 4.3.4 Customer Benefits Simple management and minimized errors The configuration needs to be set only one time and takes effect permanently. The new virtual machines added to disk arrays will be automatically protected. 4.4 One-Key Switchover and Test 4.4.1 Background The DR switchover and testing have complicated procedures and operations. Each step must be strictly performed and one mistake may result in a switchover failure and service interruption. To reduce the DR switchover and testing complexity and improve the DR switchover success rate, Huawei provides pre-configured mature switchover and test procedures for different applications based on their characteristics, environments, and protection technologies. The whole switchover or test process can be started in one click. 4.4.2 Technical Principles develops different switchover and test procedures for different applications. The procedures are step by step and can be automatically uploaded, explained, and set by the built-in recovery procedure engine module. The whole switchover or test process is in one click without complicated manual configuration or operation. 4.4.3 Technical Highlights The pre-configured switchover or test procedures can be edited to meet users' customized switchover and test requirements. 4.4.4 Customer Benefits Low DR switchover and test complexity High DR switchover success rate Low RTO 16
4 Software Functions 4.5 Intuitive DR 4.5.1 Background A DR environment usually involves different applications, hosts, and disk arrays, which pose different data protection requirements, so different data protection policies are made for them. The DR relationship among production hosts, database applications, and disk arrays must be intuitively displayed for easy management. 4.5.2 Technical Principles Huawei provides policy-driven protection for different applications. Under this policy-driven protection, Server creates protection groups for applications, and then these protection groups automatically generate intuitive DR topologies according to the host environments, stacks, and DR relationships of the corresponding applications. 4.5.3 Technical Highlights DR topologies are generated based on the environments and data protection measures of applications. 4.5.4 Customer Benefits achieves intuitive DR management, enabling users to know DR paths and status at any time. 4.6 Northbound Integration 4.6.1 Background A well-designed system must be open and easily integrated with third-party systems, and its external development interfaces must comply with industrial standards and development trends. 4.6.2 Technical Principles The modules such as device resource management, protection configuration, DR restoration, and service monitoring function as SOA service components and are exposed by the RESTful interface. This interface can be used for the development of the web GUI and third-party integration. 17
4 Software Functions 4.6.3 Technical Highlights provides light-weighted RESTful web services based on the SOA mechanism and industrial standards. 4.6.4 Customer Benefits This facilitates the integration with customers' existing systems. 18
5 Acronyms and Abbreviations 5 Acronyms and Abbreviations VSS SOA REST OSGI Volume Shadow Copy Service Service-Oriented Architecture Representational State Transfer Open Service Gateway Initiative 19