Page 1 IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases (Version 10.2.1) Performance Evaluation and Analysis 2014 Prasa Urithirakodeeswaran
Page 2 Contents Introduction... 3 Configuration Test... 4 Test Methodology... 4 Test Variations... 4 Test 1 Single Subscription without Fast Apply... 4 Test 2 Single Subscriptions with Fast Apply... 4 Test 3 Two Subscriptions with Fast Apply... 5 Results... 5 Summary... 7 Row Length Test... 8 Test Methodology... 8 Tests... 8 Results... 9 Summary...10 Appendix B: CDC for DB2 Configuration...12 Appendix C: Formulas and Units...13 Appendix D: Table Definitions...14 Notices...18
Page 3 Introduction This paper describes the result of a performance study of IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases Version 10.2.1 and demonstrates the characteristics and behaviour of its data replication. The paper explores various tuning techniques and workload types and examines their affect on performance. The primary performance measurement used for comparison is the data throughput rate when replicating to and from DB2 LUW databases. Please note that the workload used in these tests is not representative of all workloads. The workload used in this test may not match your production workload. 1 1 Performance is based on measurements and projections using CDC benchmarks in a controlled environment. The results that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, the amount of CPU capacity available during processing, and the workload processed. Therefore, results may vary significantly and no assurance can be given that an individual user will achieve results similar to those stated here. These results should be used for reference purposes only. The test scenarios (hardware configuration and workloads) used in this document to generate performance data are not considered best performance case scenarios. Performance may be better or worse depending on the hardware configuration, data set types and sizes, and the overall workload on the system. The information contained in this document is distributed on an AS IS basis without any warranty either expressed or implied. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer s ability to evaluate and integrate them into their operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
Page 4 Configuration Test There are many different ways to configure CDC. The configuration choices depend on the nature of the workload and the desired throughput versus the system resource footprint. These tests provide samples of a few configurations for given workloads and show their effect in increasing the overall throughput. Test Methodology The workload used for this test: Insert only workload 10 Inserts per commit 10 Tables 23 Columns per table with a variety of data types equaling 240 bytes per row. 2 Approximately five million rows in total workload InfoSphere IIDR 10.2.1 IF2 o One or more source instance (as described below) o Single target instance Parallelize by table fast apply mode used (as described below) o 10 image builders o number of apply threads equal to number of tables per subscription o 10,000 row threshold. Complete database and hardware specifications found in Appendix A: Machine and Environment Specifications and Appendix B: CDC for DB2 Configuration The test workload was generated in advance so that the test would not be influenced or constrained by the workload generation. Test Variations Three CDC configurations were used in order to maximize the throughput while varying the footprint and possible semantic application constraints. Test 1 Single Subscription without Fast Apply This test uses the simplest configuration of a single subscription without usage of the Fast Apply feature. This configuration supports the strictest level of workload transaction consistency and is the simplest to configure and maintain. Test 2 Single Subscriptions with Fast Apply This test uses a single source instance and a single subscription along with the Fast Apply product feature to parallelize the target database connection. This configuration can be used when reordered operations in the workload will not break data integrity constraints. 2 The detailed schema can be found in Appendix D: Table Definitions
Page 5 Test 3 Two Subscriptions with Fast Apply Test 3 demonstrates the effect of splitting the workload between two subscriptions in a single source instance in order to take advantage of shared scrape and parallelism in sending the data to the target, while also using Fast Apply to parallelize the target database connection. This configuration requires that the tables in the workload can be split between subscriptions without breaking data integrity constraints. Results Test 1 Test 2 Test 3 Rows Applied per second elapsed 26,685 67,698 111,747 MB Applied per second elapsed 6.1 15.49 25.58 MB Applied per CPU second 3 6.16 6.24 6.45 CPU Consumed per MB (seconds) 0.16 0.16 0.15 Throughput (RPS) 120000 100000 80000 RPS 60000 40000 20000 0 Test 1 Test 2 Test 3 When comparing these configurations in terms of throughput (rows per second), we see that each configuration yields a substantial gain in RPS, with the final configuration yielding replication rate of over 111,000 RPS. 3 CPU measurements include the main InfoSphere CDC processes. The CPU for the database itself is not included.
Page 6 Throughput Throughput 30 12 25 10 MB / Second 20 15 10 MB / Second 8 6 4 5 2 0 Test 1 Test 2 Test 3 0 1 2 2 3 The above charts show the throughput expressed as MB/second, with the second chart showing the difference in throughput between each pair of test configurations. The second chart shows us the gains from the second and third tests were of similar magnitude. In other words, parallelizing the target engine database connection through Fast Apply provided similar gains as splitting the subscriptions to parallelize components of the source engine. CPU Footprint 6.5 6.45 6.4 6.35 CPU Seconds / MB 6.3 6.25 6.2 6.15 6.1 Series1 6.05 6 Test 1 Test 2 Test 3 CPU Seconds/MB measures the efficiency by which the workload is replicated. Here we see that each configuration increased the CPU efficiency as well as increasing the total throughput.
Page 7 Summary These tests illustrate that CDC provides a variety of configuration options that can be tailored to match the workload and desired throughput. The choice of configuration can have a dramatic effect of the performance obtained. In this scenario, the difference between Test 1 and Test 3 was a 419% gain in throughput.
Page 8 Row Length Test The purpose of this test is to show how the CDC product scales with respect to row size. 4 Test Methodology The workload used for this test: Insert only workload 10 Inserts per commit 10 Tables 23 Columns per table with a variety of data types equaling 240 to 1200 bytes per row. 5 Approximately five million rows in total workload IIDR 10.2.1 IF2 o Single source instance o Single target instance o Single subscription Parallelize by table fast apply mode used 6 o 10 image builders o number of apply threads equal to number of tables per subscription o 10,000 row threshold. Complete database and hardware specifications found in Appendix A: Machine and Environment Specifications and Appendix B: CDC for DB2 Configuration The test workload was generated in advance so that the test would not be influenced or constrained by the workload generation. Tests All the tests were identical with the exception of the table schema which was modified to produce row lengths varying from 240 bytes to 1200 bytes. 4 There were many other axes that could have been examined (e.g., such as number of columns, varying data types, etc ) Row size was chosen as an illustrative example of the impact a specific workload may have on replication throughput. 5 The detailed schema can be found in Appendix D: Table Definitions 6 The Fast Apply product feature was used to ensure the target performance was as fast as possible. In several of the tests this feature may not have been necessary in order to consume the source workload and thus artificially added an additional resource footprint. However for the sake of simplicity, the feature was enabled for all tests.
Page 9 Results Row Size (bytes) 240 480 720 960 1200 Rows Applied per second elapsed 67,698 59,391 53,646 46,455 39,798 MB Applied per second elapsed 15.49 27.19 36.84 42.53 45.54 MB Applied per CPU second 7 6.24 11.68 16.80 21.13 24.72 CPU Consumed per MB (seconds) 0.160 0.086 0.059 0.047 0.040 Row Length Scalability (CPU Footprint) 0.18 0.16 0.14 CPU Seconds / MB 0.12 0.1 0.08 0.06 Series1 0.04 0.02 0 240 480 720 960 1200 Row Length (Bytes) As can be seen in the above chart, the CPU cost decreases as the row length increases, or in other words, bytes transferred becomes more efficient with respect to CPU. 7 CPU measurements include the main InfoSphere CDC processes. The CPU for the database itself is not included.
Page 10 Row Length Scalability 80000 70000 60000 Throughput (RPS) 50000 40000 30000 20000 10000 0 240 480 720 960 1200 Row Length (bytes) It is expected that the number of rows replicated will decrease as we increase the row size, due to the extra processing required for each row. Summary These tests demonstrate the significance of the replication workload and size to the overall throughput and CPU consumption. The number of rows replicated, as expected, decreases as the size of the workload increases, but when examining data size (MB) rather than row count, we see that the throughput is actually increasing and the overall data replication rate is increasing.
Page 11 Appendix A: Machine and Environment Specifications Source and Target Machines Utilized two IBM Power 740 Express servers with 256G RAM, 1000baseT network connection, 2X POWER7+ 3.55G 8 core CPUs (max 4 threads per core). DS 4700 SAN storage, with 300G 15K RPM FC disks. 2X300G mirroring for OS 3 RAID0 arrays with 3 300G disks for each array (san1, san2, san3). 1Xlocal RAID0 disk with 3 140G SAS disks. DB2 Database Version 10.5 utilized on both source and target.
Page 12 Appendix B: CDC for DB2 Configuration Used CDC for DB2 databases Version 10.2.1 with Interim Fix 2 applied. Configured to use 4 GB memory for each source instance, 4 GB of memory for the target instance and 1 GB disk for each instance. Set system parameter global_max_batch = 250
Page 13 Appendix C: Formulas and Units Item Rows Published per second Rows Published per CPU second MB Published per CPU second CPU Consumed per MB (second) Description Total Rows / Elapsed Time (seconds) Total Rows / CPU Consumed (seconds) (Row length * rows) / CPU Consumed (seconds) CPU Consumed (seconds) / MB
Page 14 Appendix D: Table Definitions The following table definition was used in the Configuration Test and in the 240 byte Row Length Test. CREATE TABLE "WP"."TB240" ( "CUSTNO" DECIMAL(10, 0) NOT NULL, "STATE" CHAR(2), "INT04" DECIMAL(4, 0) NOT NULL, "INT03" DECIMAL(4, 0) NOT NULL, "INT02" DECIMAL(4, 0) NOT NULL, "INT" DECIMAL(4, 0) NOT NULL, "DEC01" DECIMAL(31, 0), "DEC02" DECIMAL(25, 0), "DEC03" DECIMAL(17, 0), "CHR04" CHAR(17), "CHR03" CHAR(17), "CHR02" CHAR(17), "CHR" CHAR(17), "VCR010" CHAR(13), "VCR09" CHAR(13), "VCR08" CHAR(13), "VCR07" CHAR(13), "VCR06" CHAR(13), "VCR05" CHAR(13), "VCR04" CHAR(13), "VCR03" CHAR(13), "VCR02" CHAR(13), "VCR" CHAR(13) ) ORGANIZE BY ROW DATA CAPTURE CHANGES IN "USERSPACE1" COMPRESS NO;
Page 15 The following table was used in the 480 byte Row Length Test CREATE TABLE "WP"."TB480" ( "CUSTNO" DECIMAL(10, 0) NOT NULL, "STATE" CHAR(2), "INT04" DECIMAL(4, 0) NOT NULL, "INT03" DECIMAL(4, 0) NOT NULL, "INT02" DECIMAL(4, 0) NOT NULL, "INT" DECIMAL(4, 0) NOT NULL, "DEC01" DECIMAL(31, 0), "DEC02" DECIMAL(25, 0), "DEC03" DECIMAL(17, 0), "CHR04" CHAR(17), "CHR03" CHAR(17), "CHR02" CHAR(17), "CHR" CHAR(17), "VCR010" CHAR(37), "VCR09" CHAR(37), "VCR08" CHAR(37), "VCR07" CHAR(37), "VCR06" CHAR(37), "VCR05" CHAR(37), "VCR04" CHAR(37), "VCR03" CHAR(37), "VCR02" CHAR(37), "VCR" CHAR(37) ) ORGANIZE BY ROW DATA CAPTURE CHANGES IN "USERSPACE1" COMPRESS NO;
Page 16 The following table was used in the 720 byte Row Length Test CREATE TABLE "WP"."TB720" ( "CUSTNO" DECIMAL(10, 0) NOT NULL, "STATE" CHAR(2), "INT04" DECIMAL(4, 0) NOT NULL, "INT03" DECIMAL(4, 0) NOT NULL, "INT02" DECIMAL(4, 0) NOT NULL, "INT" DECIMAL(4, 0) NOT NULL, "DEC01" DECIMAL(31, 0), "DEC02" DECIMAL(25, 0), "DEC03" DECIMAL(17, 0), "CHR04" CHAR(17), "CHR03" CHAR(17), "CHR02" CHAR(17), "CHR" CHAR(17), "VCR010" CHAR(61), "VCR09" CHAR(61), "VCR08" CHAR(61), "VCR07" CHAR(61), "VCR06" CHAR(61), "VCR05" CHAR(61), "VCR04" CHAR(61), "VCR03" CHAR(61), "VCR02" CHAR(61), "VCR" CHAR(61) ) ORGANIZE BY ROW DATA CAPTURE CHANGES IN "USERSPACE1" COMPRESS NO;
Page 17 The following table was used in the 1200 byte Row Length Test CREATE TABLE "WP"."TB1200" ( "CUSTNO" DECIMAL(10, 0) NOT NULL, "STATE" CHAR(2), "INT04" DECIMAL(4, 0) NOT NULL, "INT03" DECIMAL(4, 0) NOT NULL, "INT02" DECIMAL(4, 0) NOT NULL, "INT" DECIMAL(4, 0) NOT NULL, "DEC01" DECIMAL(31, 0), "DEC02" DECIMAL(25, 0), "DEC03" DECIMAL(17, 0), "CHR04" CHAR(17), "CHR03" CHAR(17), "CHR02" CHAR(17), "CHR" CHAR(17), "VCR010" CHAR(85), "VCR09" CHAR(85), "VCR08" CHAR(85), "VCR07" CHAR(85), "VCR06" CHAR(85), "VCR05" CHAR(85), "VCR04" CHAR(85), "VCR03" CHAR(85), "VCR02" CHAR(85), "VCR" CHAR(85) ) ORGANIZE BY ROW DATA CAPTURE CHANGES IN "USERSPACE1" COMPRESS NO;
Page 18 Notices Copyright IBM Corporation 2014 All Rights Reserved. IBM Canada 8200 Warden Avenue Markham, ON L6G 1C7 Canada Neither this documentation nor any part of it may be copied or reproduced in any form or by any means or translated into another language, without the prior consent of the above mentioned copyright owner. IBM makes no warranties or representations with respect to the content hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. IBM assumes no responsibility for any errors that may appear in this document. The information contained in this document is subject to change without any notice. IBM reserves the right to make any such changes without obligation to notify any person of such revision or changes. IBM makes no commitment to keep the information contained herein up to date. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All performance data contained in this publication was obtained in the specific operating environment and under the conditions described above and is presented as an illustration only. Performance obtained in other operating environments may vary, and customers should conduct their own testing The information in this document concerning non-ibm products was obtained from the supplier(s) of those products. IBM has not tested such products and cannot confirm the accuracy of the performance, compatibility, or any other claims related to non-ibm products. Questions about the capabilities of non-ibm products should be addressed to the supplier(s) of those products. IBM, the IBM logo and InfoSphere are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. References in this publication to IBM
Page 19 products or services do not imply that IBM intends to make them available in all countries in which IBM operates.