IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Size: px

Start display at page:

Download "IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide"

Barry Arnold
6 years ago
Views:

1 V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication Development Team Hiroyuki Miyoshi Satoshi Takai Hiroshi Araki SONAS / V7 Unified Replication Architect Thomas Bish Development Manager Norie Iwasaki Page 1 of 39

2 V7 Unified Asynchronous Replication Performance Reference Guide TABLE OF CONTENTS 1 DOCUMENT INFORMATION SUMMARY OF CHANGES RELATED DOCUMENTS INTRODUCTION ASYNC REPLICATION OPERATIONAL OVERVIEW ASYNC REPLICATION PHASES ASYNC REPLICATION MODE REPLICATION FREQUENCY AND RPO CONSIDERATIONS ASYNC REPLICATION PERFORMANCE BENCHMARK TEST ENVIRONMENT PERFORMANCE CHARACTERISTICS FOR VARIOUS FACTORS Async Replication Configuration Options Fullsync Network Host Workload Number of Files File Size (Fixed Size) Filesystem Configuration TYPICAL CASES Case Definition Results...31 Page 2 of 39

3 V7 Unified Asynchronous Replication Performance Reference Guide 1 Document Information 1.1 Summary of changes Version Date Short Description.1 213/7/3 initial draft version.2 213/9/2 added test machines, modified test cases.3 213/1/11 performance results added, revised chapters.4 213/1/16 updated chapter 3 and 4 per review comments, corrected typo in chapter /1/18 corrected wording per review comments 1.2 Related Documents IBM V7 Unified Information Center SONAS Copy Services Asynchronous Replication - Best Practices document can be searched from ibm.com/support/entry/portal/overview/hardware/system_storage/network_attached _storage_%28nas%29/sonas/scale_out_network_attached_storage Version R1.4.1 is available as of today (213/1/18) which gives enough information for this performance reference guide. The direct link to Version R1.4.1 is 1.ibm.com/support/docview.wss?uid=ssg1S74448 Version R1.4.2 is planned to be uploaded IBM Storwize V7 Unified Performance Best Practice Version 1.2 is planned to be uploaded. Once uploaded, it should be available in the same portal: ibm.com/support/entry/portal/overview/hardware/system_storage/network_attached _storage_%28nas%29/sonas/scale_out_network_attached_storage Page 3 of 39

4 V7 Unified Asynchronous Replication Performance Reference Guide 2 Introduction V7 Unified asynchronous replication (async replication) provides a function to copy data from a primary V7 Unified (source V7 Unified) to a remote V7 Unified (target V7 Unified). As the name implies, the copy is done asynchronous to the host I/O. The basis of the disaster recovery solution is built around this function. Every time the async replication is invoked, the incremental data is scanned and transferred to the target. The async replication can be scheduled to enable the periodic execution. In order to keep the target up-to-date and meet the RPO (Recovery Point Objective) requirement, the of the async replication is important. Also, since this runs as a storage background task, it is necessary to understand the resource consumption (such as CPU, memory, and disk) so that the frequency and the time can be decided as not to impact the host I/O response time or other V7 Unified storage applications such as NDMP. A separate best practice guide (SONAS Copy Services Asynchronous Replication - Best Practices) already describes the overall, basic usage, considerations, and problem determination of async replication. The location of the Best Practices is described in 1.2 Related Documents. This document focuses on the performance investigation and the resource consumption of async replication under various configurations. The purpose is to give a reference to help configure V7 Unified that uses async replication. Page 4 of 39

5 V7 Unified Asynchronous Replication Performance Reference Guide 3 Async Replication Operational Overview 3.1 Async Replication Phases The async replication is per filesystem basis and each replication goes through the following phases: Snapshot of source file system to create a write-consistent point in time image of the file space to replicate Scan of the source file system snapshot for created/modified/deleted files and directories since the last successful completion of async replication Replication of changed contents to the target system Snapshot of target file system, Removal of source file system snapshot The detail of each phase is described in the best practice guide or the information center. Here are some notes from the performance perspective, Scan phase This is executed on the source V7 Unified. If both nodes are configured as replication participating nodes, both will be used. This will scan the metadata for all the files in the filesystem. It is recommended to have metadata spread across multiple NSDs. By default when creating a filesystem by GUI or CLI mkfs command, multiple NSDs (vdisks) with dataandmetadata types are automatically created which is OK. Use of SSDs for the metadata is also recommended to improve the scan performance. The scan involves text sorting. Depending on the total number of files and the length and complexity of the directory structure and file names, this may consume noticeable CPU usage on the nodes. The memory usage is limited to 5% of the physical memory on nodes per async operation. The time it takes for the scan phase is dependent on the total number of files in the source filesystem and the length and complexity of the directory structure instead of file sizes. Replication phase Multiple rsync processes will be invoked in parallel (if configured) from both nodes (if configured) to replicate the changed files to the target. The async operation will segment the changed files into groups based on their size and location within the source file system. Large files (over 1MB) are processed first. As replication processes complete the large files, the smaller files will be processed. Rsync determines the transmission mode (whole or delta) for each file depending on if the file exists on the target side and the size of the file on the source and the target. In whole-transfer mode, rsync will send the whole data of the file to the target. In delta-transfer mode, rsync will find out the delta blocks between the source/target and will send only the delta to the target via network. Page 5 of 39

6 V7 Unified Asynchronous Replication Performance Reference Guide 3.2 Async Replication Mode There are 2 modes in async replication: incremental fullsync Incremental replication When startrepl CLI command is invoked without any option, incremental replication is executed by default. This will scan for the files that have been modified since the last successful replication and replicate the modified files only to the target side. Fullsync replication This can be executed by adding --fullsync option to startrepl. This will list up all the files/directories in the filesystem and let rsync process the full list. Rsync will compare all the files identified in the source file system and will only update the modified files. Typically, a customer should not need to specify "--fullsync". If startrepl is invoked for the first time for a filesystem or if a code detected a severe error condition that it would be safe to audit all the files, it will automatically switch to fullsync mode. The --fullsync option is primarily intended to audit the target against the source if an event made it such that the target file system may have skewed from the source from the normal incremental replications. The --fullsync should also be used in the initial failback from the DR site to the original primary to ensure the primary source system is recovered correctly from the state of the disaster against the current state of the DR site. Page 6 of 39

7 V7 Unified Asynchronous Replication Performance Reference Guide 4 Replication Frequency and RPO Considerations The recovery point objective (RPO) describes the acceptable amount of data loss, in time, should a disaster occur. The RPO for async replication is based off of the frequency of the startrepl command and is limited by the duration of the async replication operation. The first step in the async process is to create a point in time snapshot of the source file system which creates a data consistent point. This snapshot is used as a base to perform the changed file (delta) scan and as the source of the file replication. Only one async increment may be running at any time per replication relationship. Therefore the minimum RPO is defined by the time of the completion of one increment to the time of the completion of the next increment. The duration of the replication depends on a number of factors, including: Amount of files within the source file system - The number of files contained in the source file system effects the duration of the scan time. Async replication uses a high-speed file system scan to quickly identify the files that have been changed since the last successful async replication. While the scan is optimized, the number of files contained in the file system will add to the time it takes to perform the scan. The scan process is distributed across the number of nodes configured to participate in the async process. Number of disks and how the source/target filesystems are created from them The disk bandwidth plays a major role in the performance of async replication. It tends to be the bottleneck in many cases. Generally, if more mdisks (V7 RAID volumes) with sufficient number of disks are used for the source/target filesystems, the faster replication will likely to be obtained. When multiple filesystems are created from the same mdisk group, the async replication on a filesystem and the host I/O to the other filesystems will share the same disks and have impacts to each other. Especially, when multiple replications to those filesystems are executed at the same time, they will all be limited by the capability of the shared disks. It is strongly recommended to stagger the replication in such case. Amount of changed data requiring replication - The time it takes to transfer the contents from the source V7 Unified to the target is a direct result of the amount of data which has been modified since the last async operation. Basically, the async operation only moves the changed contents of files between the source and target to minimize the amount of data needing to be sent over the network. However, depending on the file size, it is faster to send the whole data instead of finding the delta and sending only the delta. In R1.4.2, if the file size is larger than 1GB or less than 1MB, the transmission mode is changed to whole-file transfer. File size profile of the changed files - When talking about the of a replication, in terms of MB/sec, if there are many large files (over GB), the data transfer will be dominant and the MB/sec value will be high. However, if there are many small files (like a few KB), the file attributes transfer and inode delete/create operations will be dominant. In that case, the MB/sec value will show very low value and we recommend to evaluate file/sec as the Bandwidth and capabilities of the network between V7 Unified systems - The network capabilities play a factor in the time it takes the replication process to complete. Enough network bandwidth must be available to handle all of the updates that have occurred since the start of the last increment before the start of the next scheduled increment. Otherwise, the next startrepl will fail which will affectively double the RPO for this time. Number of source and target nodes participating in async replication - It is strongly recommended to use both nodes for replication. It is possible to only configure one node Page 7 of 39

8 V7 Unified Asynchronous Replication Performance Reference Guide pair between the source and the target V7 Unified but the scan/rsync will be half unless network or disk is extremely slow. Also, the redundancy is lost in case something happens to a node during replication. Configuration parameters for replication The number of replication processes configured per node, the strong vs fast encryption cipher, and software compression help to tune the replication performance. See below for more information. Other workloads running concurrently with replication Using HSM managed file systems at source or target file systems- HSM managed file systems can greatly affect the time it takes for async replication to complete if the changed files in the source file system have been moved to the secondary media before async replication has replicated it to the target. NDMP - NDMP uses the same scan that the async replication uses. The scan is an I/O intensive operation and may consume significant CPU time. When scheduling tasks, it is recommended to stagger the time for NDMP and async replication. Number of replication processes per node - This configuration parameter allows for more internal rsync processes to be used on each node to replicate data to the other side. This provides more parallelism and increases the potential replication bandwidth and rate. Encryption Cipher The data transferred between V7 Unified systems are encrypted via SSH as part of the replication. The default cipher is strong (AES), but limits the maximum per process network transfer rate to approximately 35-4 MB/s. The fast cipher (arcfour) is not as strong, but increases the per replication process network transfer rate to approximately 95 MB/s. On trusted networks, it is advised to use the fast cipher for increasing bandwidth utilization. Software compression This compresses the data to be transferred over the network before being encrypted. The compression is performed using the CPU of the node transferring the data. For data which is compressible, this can provide a means of reducing the data to be sent over the network to increase the effective bandwidth. If the data is not compressible, the calculation overhead is added and the overall may degrade. Page 8 of 39

9 V7 Unified Asynchronous Replication Performance Reference Guide 5 Async Replication Performance Benchmark First, the performance characteristic investigation is done for each factor. The obtained data and the trend are shown in section 5.2 Performance Characteristics for Various Factors. Afterwards, a few typical use cases are defined and the performance benchmark is shown for each in section 5.3 Typical Cases. 5.1 Test Environment Replication Source V7 Unified 2 Filer Nodes and 1 V7 Filer Node (273-7): x365 M3 server CPU Intel(R) Xeon(R) x 8 Memory 72GB (GPFS cache 36GB) V7 ( ): 3GB 1KRPM 6Gbps SAS HDD x 21 (6TB) 8Gbps FC V7 Unified version: Replication Target V7 Unified 2 Filer Nodes and 1 V7 Filer Node (273-72): x365 M4 server CPU Intel(R) Xeon(R) x 4 Memory 72GB (GPFS cache 36GB) V7 ( ): 9GB 1KRPM 6Gbps SAS HDD x 23 (2TB) 8Gbps FC V7 Unified version: Network 1 Gbps, bonding mode: balance-alb (6) 1 Gbps, bonding mode: active-backup (1) (1 node pair only) Page 9 of 39

10 V7 Unified Asynchronous Replication Performance Reference Guide 5.2 Performance Characteristics for Various Factors The following sections illustrate how changes in the various parameters impact the replication performance and time Async Replication Configuration Options Objective: To illustrate the effects in replication performance, duration, and resource consumption when various async replication configuration options are varied. Test variations: Number of nodes (1, 2) Number of processes (1, 5, 1) Encryption cipher (strong, fast) Compression (enabled, disabled) For the compression test, 2 sets of test data are used 1. Completely random data which the compression ratio is very low (almost 1:1). 2. filled data which the compression ratio is high. In case of 1 GB file, the ratio is almost 1:1. Test configurations: Test files 5k files, about 63 GB total, file size varying from a few KB to a few GB (same as #7 in "Typical Cases" in section 5.3) test files containing random data Filesystem GUI default (Basic-RAID5 capacity template) GPFS blocksize: 256 KB 3 NSDs (Vdisks) per file system, all dataandmetadata V7 Mdisk group extent size: 256 KB 3 RAID5 Mdisks with physical disks (7, 7, 6) for source cluster and (7, 7, 7) on target cluster Mdisk stripe size: 128 KB Network 1 Gbps no latency Async Replication Configurations Source & target snapshot Host workload none Replication: Initial replication Measured values: Elapsed time for scan/replication, Throughput (MB/sec, file/sec) for replication phase CPU/memory/storage utilization for scan/replication Page 1 of 39

11 Rsync Phase Throughput (MB/sec) Rsync Phase Throughput (MB/sec) V7 Unified Asynchronous Replication Performance Reference Guide Results: Number of Nodes node 2 node Number of Nodes Fig ) Node number impact on rsync As for the rsync time, 2 node configuration is almost 2 times faster. The scan time in these cases didn't show much difference since only 5k test files were used and both completed in about 1 seconds. However, the scan is also expected to increase with the number of nodes. Number of Processes Per Node proc 5 proc 1 proc Number of Processes Fig ) Process number impact on rsync The rsync also increases as the number of processes is increased from 1 to 1. The resource consumption is shown below. Page 11 of 39

12 Disk Utilization (%) CPU Usage (%) V7 Unified Asynchronous Replication Performance Reference Guide Source V7 Unified active management node CPU usage: Replication 1 proc 5 proc 1proc Fig ) CPU usage of the source V7 Unified active management node for various replication process number As shown, the 1 process case consumes about 15% more CPU than the 1 process case. Source V7 Unified active management node NSD (VDISK) usage: Replication 1 proc 5 proc 1proc Fig ) Disk utilization of the source V7 Unified active management node for various replication process numbers In this environment, when 5 or 1 processes are used, the source disk access is almost 1% indicating the bottleneck. Page 12 of 39

13 Disk Utilization (%) CPU Usage (%) V7 Unified Asynchronous Replication Performance Reference Guide Target V7 Unified node CPU usage Replication 1 proc 5 proc 1 proc Fig ) CPU usage of the target V7 Unified node for various replication process numbers Target V7 Unified node NSD (VDISK) usage: Replication Elpased Time (sec) 1 proc 5 proc 1 proc Fig ) Disk utilization of the target V7 Unified node for various replication process numbers Async replication copies large files (>1MB) first. Looking at the log of the 1 process replication run, the small files (a few KB) started to get replicated after 25 minutes (15 sec) to the end (around 3 sec). This matches the disk utilization peaks shown. As for the memory usage, the free memory didn't change much through out the replication (4-5 MB usage maximum out of 15-2GB free memory). GPFS pre-allocates the memory as page pool so the memory consumption will not increase with the I/O. The amount of memory used by the scan is limited to 5% and each rsync handles files one at a time so it would not use excessive amount of memory. Page 13 of 39

14 Rsync Phase Throughput (MB/sec) Rsync Phase Throughput (MB/sec) V7 Unified Asynchronous Replication Performance Reference Guide Encryption Type strong (aes128-cbc) Encryption Type Fig ) Encryption type impact on rsync fast (arcfour) No difference was observed in this test case. 2 rsync processes were invoked in total and the average was about 11 MB/sec for each. This is due to the disk access limitation as shown above. The strong encryption cipher (aes128-cbc) is known to max out at 35-4 MB/sec so if the disk and the network allowed over 4 MB/sec for each, the encryption type would start to show a difference. Compression When this option is enabled, rsync will compress the data before sending to the target V7 Unified. The target will de-compress the data upon receiving compression off compression on random data Test File Data filled with zeros Fig ) Compression option impact on rsync The compression option severely degraded the of the random data test files but improved when the zero-filled test files were used. For the random test data, the compression will not be able to compress data and the performance dropped due to the overhead. The next graph shows how the compression keeps consuming the CPU. Page 14 of 39

15 CPU Usage (%) CPU Usage (%) V7 Unified Asynchronous Replication Performance Reference Guide 12 compression on compression off Replication Fig ) Random test files compression option impact on CPU usage of the source node For the zero-filled data, the CPU will still be consumed but the data will be compressed and the overall replication time is shorter Replication compression on compression off Fig ) Compression option impact on CPU usage of the source node (zero-filled data) Sparse option effect One other interesting aspect is the sparse option. In R1.4.x, a sparse option is always enabled and it will always create sparse files on the target. Due to this option, the amount of write I/O and the consumed space on the target side are significantly decreased for the zero-filled test data case executed for the compression test above. The sparse option takes effect regardless of the compression option. Page 15 of 39

16 Disk Utilization (%) V7 Unified Asynchronous Replication Performance Reference Guide The following table shows the df output of the filesystem on the target V7 Unified after the replication of the 2 data patterns (the compression is disabled). The source V7 Unified has about 631GB but due to the sparse option, very little data is written to the target side for zerofilled case. test case random data test files zero-filled test files df output Filesystem 1K-blocks Used Available Use% Mounted on /dev/gpfs % /ibm/gpfs Filesystem 1K-blocks Used Available Use% Mounted on /dev/gpfs % /ibm/gpfs The disk utilization on the target side also shows the significant decrease of write I/O for the zerofilled test case random data zero-filled data Replication Fig ) Target node disk utilization for random data and zero filled test data Page 16 of 39

17 V7 Unified Asynchronous Replication Performance Reference Guide Fullsync Objective: To show the replication performance of fullsync vs incremental Test variations: async replication mode Test configurations: Test files 5k files, about 63 GB total, file size varying from a few KB to a few GB (same as #7 in "Typical Cases" in section 5.3) test files containing random data Filesystem GUI default (Basic-RAID5 capacity template) GPFS blocksize: 256 KB 3 NSDs (Vdisks) per file system, all dataandmetadata V7 Mdisk group extent size: 256 KB 3 RAID5 Mdisks with physical disks (7, 7, 6) for source cluster and (7, 7, 7) on target cluster Mdisk stripe size: 128 KB Network 1 Gbps no latency Async Replication Configurations 2 nodes 1 processes each Fast encryption cipher No compression Source & target snapshot Host workload none Replication: Incremental replication (1% delta) Fullsync replication (1% delta) Measured values: Elapsed time for scan/replication, Throughput (MB/sec, file/sec) for replication phase CPU/memory/storage utilization for scan/replication Page 17 of 39

18 Total V7 Unified Asynchronous Replication Performance Reference Guide Results: Initial Incremental Incremental fullsync Replication Fig ) Rsync difference between fullsync and incremental When a --fullsync option is supplied, it will audit the entire files on the source side so there will be some overhead compared to a pure incremental replication. However, it will not send the data over the network as long as the attributes of the files are the same so the required time is much less the initial full data send replication. Page 18 of 39

19 V7 Unified Asynchronous Replication Performance Reference Guide Network Objective: To show the performance impact of the network bandwidth and latency Test variations: 1 Gbps vs 1 Gbps 1 Gbps Insert network latency by netem Test configurations: Test files 5k files, about 63 GB total, file size varying from a few KB to a few GB (same as #7 in "Typical Cases" in section 5.3) test files containing random data Filesystem GUI default (Basic-RAID5 capacity template) GPFS blocksize: 256 KB 3 NSDs (Vdisks) per file system, all dataandmetadata V7 Mdisk group extent size: 256 KB 3 RAID5 Mdisks with physical disks (7, 7, 6) for source cluster and (7, 7, 7) on target cluster Mdisk stripe size: 128 KB Async replication configuration 1 node or 2 nodes 1 processes each Fast encryption cipher Source & target snapshot Host workload none Replication: Initial replication Measured values: Elapsed time for scan/replication, Throughput (MB/sec, file/sec) for replication phase Network transferred byte Page 19 of 39

20 Rsync Phase Throughput (MB/sec) Rsync Phase Throughput (MB/sec) V7 Unified Asynchronous Replication Performance Reference Guide Results: 1 Gbps vs 1 Gbps Gbps Ethernet Port Rate 1Gbps Fig ) Rsync difference between 1 Gbps and 1 Gbps network (1 node 1 process configuration) RTT (Round-Time Trip) delay impact ms 1ms 2ms 1ms RTT delay Fig ) RTT delay impact on rsync (2 node 1 process configuration) Page 2 of 39

21 V7 Unified Asynchronous Replication Performance Reference Guide Host Workload Objective: To show the replication performance while host I/O is running Test variations: Host (NFS client) I/O load Test configurations: Test files 5k files, about 63 GB total, file size varying from a few KB to a few GB (same as #7 in "Typical Cases" in section 5.3) test files containing random data Filesystem GUI default (Basic-RAID5 capacity template) GPFS blocksize: 256 KB 3 NSDs (Vdisks) per file system, all dataandmetadata V7 Mdisk group extent size: 256 KB 3 RAID5 Mdisks with physical disks (7, 7, 6) for source cluster and (7, 7, 7) on target cluster Mdisk stripe size: 128 KB Network 1 Gbps no latency Async Replication Configurations 2 nodes 1 processes each Fast encryption cipher No compression Source & target snapshot Replication: Initial replication Measured values: Elapsed time for scan/replication, Throughput (MB/sec, file/sec) for replication phase CPU/memory/storage utilization for scan/replication Host I/O (MB/sec, IOPS) Page 21 of 39

22 Rsync Phase Throughput (MB/sec) Aggregate Bandwidth (KB/s) V7 Unified Asynchronous Replication Performance Reference Guide Results: stress First, a stress tool which executes dd write is run from a NFS client. The number of dd threads within the tool is varied to control the workload. The following figure shows the filesystem performance impact (measured by fio) READ KB/s from NFS client WRITE KB/s from NFS client nostress 4 thread 8 thread 12 thread Stress Tool Number of Threads Fig ) filesystem bandwidth under stress tool replication under the workload Async replication was executed while the tool was running on the source V7 Unified thread (% load) 4 thread (22.3% load) 8 thread (25.7% load) Stress Tool Number of Threads 12 thread (59% load) Fig ) Replication rsync under various stress The following graph shows the disk utilization of the source. Page 22 of 39

23 Disk Utilization (%) V7 Unified Asynchronous Replication Performance Reference Guide thread 8 thread no stress Replication Fig ) Disk utilization of the source V7 Unified while replication with host I/O Page 23 of 39

24 V7 Unified Asynchronous Replication Performance Reference Guide Number of Files Objective: To show the replication performance characteristic (mainly scan) when the number of files is varied Test variations: Num of files (1million, 5million, 1million) Test configurations: Test files 1KB files varying the number test files containing random data Filesystem GUI default (Basic-RAID5 capacity template) GPFS blocksize: 256 KB 3 NSDs (Vdisks) per file system, all dataandmetadata V7 Mdisk group extent size: 256 KB 3 RAID5 Mdisks with physical disks (7, 7, 6) for source cluster and (7, 7, 7) on target cluster Mdisk stripe size: 128 KB Network 1 Gbps no latency Async Replication Configurations 2 nodes 1 processes each Fast encryption cipher No compression Source & target snapshot Host workload none Replication: Initial replication Measured values: Elapsed time for scan/replication, Throughput (MB/sec, file/sec) for replication phase CPU/memory/storage utilization for scan/replication Page 24 of 39

25 Rsync Phase Throughput (file/sec) Scan Phase V7 Unified Asynchronous Replication Performance Reference Guide Results: scan time As shown below, the scan time increases as the number of total files in the filesystem increases mil 5mil 1mil 1mil Number of Files Fig ) Scan time difference related to the total number of files in the source V7 Unified Rsync The rsync is expected to be consistent since the file size is the same among the test cases (1 million case is omitted because of that case had over GB files included and the characteristic will be different) mil 5mil 1mil Number of Files Fig ) Rsync related to the total number of files in the source V7 Unified NOTE: In this test case, the number of files within a subdirectory was limited to 1. However, if there are more files (especially small files like a few KB files) and a replication was invoked, 2 nodes may try to replicate files under the same sub-directory. On the target side, 2 different nodes may try to create files under the same directory at the same time and may hit the gpfs directory lock collision which may degrade the performance. Page 25 of 39

26 V7 Unified Asynchronous Replication Performance Reference Guide File Size (Fixed Size) Objective: To show the replication performance characteristic for a given file size Test variations: File size (1KB only, 1MB only, 1GB only) Test configurations: Test files File number varying depending on the size 1KB -> 1mil 1MB -> 1mil 1GB -> 1k test files containing random data Filesystem GUI default (Basic-RAID5 capacity template) GPFS blocksize: 256 KB 3 NSDs (Vdisks) per file system, all dataandmetadata V7 Mdisk group extent size: 256 KB 3 RAID5 Mdisks with physical disks (7, 7, 6) for source cluster and (7, 7, 7) on target cluster Mdisk stripe size: 128 KB Network Gbps no latency Async Replication Configurations 2 nodes 1 processes each Fast encryption cipher No compression Source & target snapshot Host workload none Replication: Initial replication Measured values: Elapsed time for scan/replication, Throughput (MB/sec, file/sec) for replication phase CPU/memory/storage utilization for scan/replication Page 26 of 39

27 Rsync Phase Throughtput (file/sec) Rsync Phase Throughtpu (MB/sec) V7 Unified Asynchronous Replication Performance Reference Guide Results: For small files (1 KB below), the in terms of MB/sec becomes extremely low. This is because the overhead of processing the file (inode and attribute data) dominates the performance yet those are not counted MB/sec 1KB x 1mil 1MB x 1mil 1GB x 1 Test Files Fig ) Rsync (MB/sec) per various test file size If the same data is viewed by file/sec, the small file is high while large file (1 GB below) becomes extremely low file/sec 1KB x 1mil 1MB x 1mil 1GB x 1 Test Files Fig ) Rsync (file/sec) per various test file size Typically when various file sizes are mixed, the tends to be measured by MB/sec of the file contents data. However, if the source side has a huge number of small files (a few KB files), the file attributes copy and the inode creation/deletion operation dominates the performance and the MB/sec value will be low. File/sec needs to be evaluated in that case. Page 27 of 39

28 V7 Unified Asynchronous Replication Performance Reference Guide Filesystem Configuration Objective: To show the replication performance impact of the filesystem storage configuration Test variations: Number of NSDs in a filesystem on source/target V7 Unified from 1 to 12 all NSDs metadataanddata Mdisk configuration from GUI menu on both source/target V7 Unified RAID5-Basic capacity, RAID5-Basic performance, RAID-performance Test configurations: Test files 5k files, about 63 GB total, file size varying from a few KB to a few GB (same as #7 in "Typical Cases" in section 5.3) test files containing random data Network 1 Gbps no latency Async replication configuration 2 nodes 1 processes each Fast encryption cipher No compression Source & target snapshot Host workload none Replication: Initial replication Measured values: Elapsed time for scan/replication, Throughput (MB/sec, file/sec) for replication phase CPU/memory/storage utilization for scan/replication Results: V7 Unified provides several options when configuring storage. Details are described in the performance white paper but the basic steps are to create mdisks (RAID volumes), vdisks (NSDs), and gpfs filesystems. From operation point of view, the mdisks and the mdisk groups need to be defined first and then creating a gpfs filesystem by CLI/GUI will create the vdisks and the filesystem. mdisk creation GUI provides several menus. Basic-RAID5 capacity --- This is the recommended default template in GUI. In this test configuration, 3 RAID-5 mdisks will be created. The numbers of physical drives are (6, 7, 7) on the source cluster and (7, 7, 7) on the target cluster. All the test cases except for this test case are done using this template. Basic-RAID5 performance --- In this test configuration, 2 RAID-5 mdisks will be created. The number of physical drives are (9, 9) RAID performance -- In this test configuration, 2 RAID- mdisks will be created. The number of physical drives are (9, 9) RAID6 is also available Page 28 of 39

29 Scan Phase Rsync Phase Throughtpu (MB/sec) V7 Unified Asynchronous Replication Performance Reference Guide filesystem/vdisk creation When created from GUI, it will automatically create 3 vdisks (NSDs) per filesystem. All the test configurations except for this test case are done using 3 vdisks. The number of vdisks (NSDs) per filesystem can be configured by mkfs CLI command. The async replication is investigated for various storage configurations NSD 3 NSD 6 NSD 12 NSD 1 5 Basic-RAID5 Capacity Basic-RAID5 Performance GUI "Configure Internal Storage" Menu RAID Performance Fig ) mdisk configuration and number of vdisk impact to rsync Basic-RAID5 Capacity Basic-RAID5 Performance GUI "Configure Internal Storage" Menu RAID Performance 1 NSD 3 NSD 6 NSD 12 NSD Fig ) mdisk configuration and number of vdisk impact to scan With only 1 V7 enclosure, these did not show much difference. Page 29 of 39

30 V7 Unified Asynchronous Replication Performance Reference Guide 5.3 Typical Cases Case Definition Based on our experiences with customers, a few "typical" cases are defined, mainly concerning: Number of files to replicate in the filesystem File size profile of the file system to replicate Number of filesystems to replicate Case Approx num of files Approx total capacity Num of filesyst ems File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) 1 1mil 315 GB 1 (1mil,,,, 2, 2) 2 1mil 2.2 TB 1 (98mil,, 2mil,, 2, 2) 3 1mil.95 GB 1 (1mil,,,,, ) 4 1mil 121 GB 1 (1mil,,,, 2, 1) 5 1mil 2.2 TB 1 (8k,, 18k, 2k, 2, 1) 6 5k 4.1 TB 1 (4k, 12k, 3k, 4k,, ) 7 5k 631 GB 1 (3k, 1k, 95k, 5k, 2, ) + 2 1GB 8 2k x 3 52 GB x 3 3 (1k, 8k, 15k, 5k, 1, ) x 3 9 5mil x GB x 3 3 (5mil,,,,, ) x 3 The rest of the configuration/condition: Filesystem configuration GUI default (Basic-RAID5 capacity template) GPFS blocksize: 256 KB 3 NSDs (Vdisks) per file system, all dataandmetadata V7 Mdisk group extent size: 256 KB 3 RAID5 Mdisks with physical disks (7, 7, 6) for source cluster and (7, 7, 7) on target cluster Mdisk stripe size: 128 KB Network 1 Gbps no latency Async replication configuration 2 nodes 1 processes each Fast encryption cipher No compression Source & target snapshot Host workload None Increment Ratio 1% (5% new, 5% updated) Replication: Initial replication Incremental replication (1% delta) Measured values: Elapsed time for scan/replication, Throughput (MB/sec, file/sec) for replication phase CPU/memory/storage utilization for scan/replication Note) As demonstrated in Network, the is always affected by the latency time of the network connection. Also, the network efficiency (packet loss, packet sequencing, and bit error rates) and network switch capabilities should be configured properly. Page 3 of 39

31 CPU Usage (%) V7 Unified Asynchronous Replication Performance Reference Guide Results The detail replication results (time and ) are shown for each case. Also, the system resource consumption (CPU and disk) are shown for some cases. Case Case 1 Approx num of files Approx total capacity Num of filesyste ms File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) 1 1mil 315 GB 1 (1mil,,,, 2, 2) Elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec System resource usage Replication source target Fig ) CPU usage on source/target V7 Unified node Page 31 of 39

32 Disk Utilization (%) V7 Unified Asynchronous Replication Performance Reference Guide 12 1 source target Replication Fig ) Disk utilization on source/target V7 Unified node As for the memory usage, the free memory didn't change much through out the replication. On the source side, the scan consumed around 3GB and released the memory after the scan. Rsync phase used a maximum of around 2GB. On the target side, rsync also used a maximum of around 2GB. The disk utilization graph (Fig ) indicates that the disk access on the source side is the bottleneck in this case. Case Case 2 Approx num of files Approx total capacity Num of filesyste ms File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) 2 1mil 2.2 TB 1 (98mil,, 2mil,, 2, 2) Elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec Case Case 3 Approx num of files Approx total capacity Num of filesyste ms 3 1mil.95 GB 1 (1mil,,,,, ) File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) Elapsed time and rsync phase : Initial Replication Incremental Replication Page 32 of 39

33 CPU Usage (%) V7 Unified Asynchronous Replication Performance Reference Guide Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec Case Case 4 Approx num of files Approx total capacity Num of filesyste ms 4 1mil 121 GB 1 (1mil,,,,2,1) File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) Elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec Case Case 5 Approx num of files Approx total capacity Num of filesyste ms File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) 5 1mil 2.2 TB 1 (8k,, 18k, 2k, 2, 1) Elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec System resource usage Replication source target Fig ) CPU usage on source/target V7 Unified node Page 33 of 39

34 Disk Utilization V7 Unified Asynchronous Replication Performance Reference Guide 12 1 source target Replication Fig ) Disk utilization on source/target V7 Unified node Case Case 6 Approx num of files Approx total capacity Num of filesyste ms File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) 6 5k 4.1 TB 1 (4k, 12k, 3k, 4k,, ) Elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec Case Case 7 Approx num of files Approx total capacity Num of filesyste ms File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) 7 5k 631 GB 1 (3k, 1k, 95k, 5k, 2, ) + 2 1GB Elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec Page 34 of 39

35 Disk Utilization (%) CPU Usage (%) V7 Unified Asynchronous Replication Performance Reference Guide System resource usage source target Replication Fig ) CPU usage on source/target V7 Unified node source target Replication Target Fig ) Disk utilization on source/target V7 Unified node Page 35 of 39

36 CPU Usage (%) V7 Unified Asynchronous Replication Performance Reference Guide Case Case 8 Approx num of files Approx total capacity Num of filesyste ms File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) 8 2k x 3 52 GB x 3 3 (1k, 8k, 15k, 5k, 1, ) x filesystems (gpfs, gpfs1, gpfs2) scheduled at the same time gpfs elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec The results of the other 2 file systems were almost identical. System resource usage Replication source target Fig ) CPU usage on source/target V7 Unified node Page 36 of 39

37 CPU Usage (%) Disk Utilization (%) V7 Unified Asynchronous Replication Performance Reference Guide Replication source dm- source dm-3 source dm-6 target dm- target dm-3 target dm-6 gpfs fs (dm- - dm-2) gpfs1 fs (dm-3 - dm-5) gpfs2 fs (dm-6 - dm-8) Fig ) Disk utilization on source/target V7 Unified node As shown, when the replications of 3 filesystems are executed concurrently, the CPU usage reaches close to 9% on the source node. This may result in seeing CPU warning (75%) / CPU high threshold (9%) messages (INFO level) during rsync phase Only 1 filesystem (gpfs) with the same test file profile gpfs elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec The is 3 times faster than when 3 filesystems are executed at the same time. System resource usage: Replication source target Fig ) CPU usage on source/target V7 Unified node Page 37 of 39

38 Disk Utilization (%) V7 Unified Asynchronous Replication Performance Reference Guide 12 1 source target Replication Fig ) Disk utilization on source/target V7 Unified node Case Case 9 Approx num of files Approx total capacity Num of filesyste ms File size profile (1KB, 1KB, 1MB, 1MB, 1GB, 1GB) 9 5mil x GB x 3 3 (5mil,,,,, ) x filesystems (gpfs, gpfs1, gpfs2) scheduled at the same time gpfs elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec gpfs1 elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec gpfs2 elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec Page 38 of 39

39 V7 Unified Asynchronous Replication Performance Reference Guide Only 1 filesystem (gpfs) with the same test file profile gpfs elapsed time and rsync phase : Initial Replication Incremental Replication Total Scan Rsync Other MB/sec file/sec Total Scan Rsync Other MB/sec file/sec Page 39 of 39

SONAS Best Practices and options for CIFS Scalability

COMMON INTERNET FILE SYSTEM (CIFS) FILE SERVING...2 MAXIMUM NUMBER OF ACTIVE CONCURRENT CIFS CONNECTIONS...2 SONAS SYSTEM CONFIGURATION...4 SONAS Best Practices and options for CIFS Scalability A guide