IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases (Version ) Performance Evaluation and Analysis

Similar documents
IBM InfoSphere Data Replication s Change Data Capture (CDC) Fast Apply IBM Corporation

How Smarter Systems Deliver Smarter Economics and Optimized Business Continuity

Infor M3 on IBM POWER7+ and using Solid State Drives

... Performance benefits of POWER6 processors and IBM i 6.1 for Oracle s JD Edwards EnterpriseOne A performance case study for the Donaldson Company

Lawson M3 7.1 Large User Scaling on System i

Infor Lawson on IBM i 7.1 and IBM POWER7+

WebSphere Application Server Base Performance

PSR Testing of the EnterpriseOne Adapter for JD Edwards EnterpriseOne 8.12, OBIEE , DAC 7.9.6, and Informatica 8.6

Oracle s JD Edwards EnterpriseOne IBM POWER7 performance characterization

IBM Tivoli Storage Productivity Center Version Storage Tier Reports. Authors: Mike Lamb Patrick Leahy Balwant Rai Jackson Shea

An Oracle White Paper September Oracle Utilities Meter Data Management Demonstrates Extreme Performance on Oracle Exadata/Exalogic

... IBM Power Systems with IBM i single core server tuning guide for JD Edwards EnterpriseOne

IBM System Storage DS8870 Release R7.3 Performance Update

Storwize/IBM Technical Validation Report Performance Verification

... WebSphere 6.1 and WebSphere 6.0 performance with Oracle s JD Edwards EnterpriseOne 8.12 on IBM Power Systems with IBM i

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

Lenovo SAN Manager - Provisioning and Mapping Volumes

Optimizing Data Transformation with Db2 for z/os and Db2 Analytics Accelerator

Backup and Recovery Best Practices

IBM InfoSphere Streams v4.0 Performance Best Practices

IBM TotalStorage Enterprise Storage Server Model RAID 5 and RAID 10 Configurations Running Oracle Database Performance Comparisons

EMC CLARiiON Backup Storage Solutions

... IBM Advanced Technical Skills IBM Oracle International Competency Center September 2013

System Performance: Sizing and Tuning

IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT

Optimizing Fusion iomemory on Red Hat Enterprise Linux 6 for Database Performance Acceleration. Sanjay Rao, Principal Software Engineer

Introduction to IBM System Storage SVC 2145-DH8 and IBM Storwize V7000 model 524

System Performance: Sizing and Tuning

10Gb iscsi Initiators

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

Video Surveillance Storage and Verint Nextiva NetApp Video Surveillance Storage Solution

IBM System Storage IBM :

Dell EMC CIFS-ECS Tool

jetnexus ALB-X on IBM BladeCenter

EMC Backup and Recovery for Microsoft SQL Server

Getting Started with InfoSphere Streams Quick Start Edition (VMware)

SAP SD Benchmark with DB2 and Red Hat Enterprise Linux 5 on IBM System x3850 M2

E-BUSINESS SUITE APPLICATIONS R12 (R12.1.3) HR (OLTP) BENCHMARK - USING ORACLE DATABASE 11g ON FUJITSU S M10-4S SERVER RUNNING SOLARIS 11

Milestone Solution Partner IT Infrastructure Components Certification Report

IBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

E-BUSINESS SUITE APPLICATIONS R12 (R12.1.3) iprocurement (OLTP) BENCHMARK - USING ORACLE DATABASE 11g ON FUJITSU S M10-4S SERVER RUNNING SOLARIS 11

IBM and Lawson M3 (an Infor affiliate) ERP software workload optimization on the new IBM PureFlex System

IBM FileNet Content Manager 5.2. Asynchronous Event Processing Performance Tuning

ORACLE SNAP MANAGEMENT UTILITY FOR ORACLE DATABASE

Comparing UFS and NVMe Storage Stack and System-Level Performance in Embedded Systems

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Oracle Event Processing Extreme Performance on Sparc T5

HP SAS benchmark performance tests

SAS workload performance improvements with IBM XIV Storage System Gen3

Océ User manual. Printer drivers. Printer driver installation

... Characterizing IBM Power Systems POWER7+ and Solid State Drive Performance with Oracle s JD Edwards EnterpriseOne

System Performance: Sizing and Tuning

NVMe SSDs A New Benchmark for OLTP Performance

IBM DB2 Analytics Accelerator for z/os, v2.1 Providing extreme performance for complex business analysis

Lenovo Database Configuration for Microsoft SQL Server TB

Technical Paper. Performance and Tuning Considerations for SAS on Fusion-io ION Accelerator

Terminal Applications Scalability testing using Rational Performance Tester version 8.1

Microsoft RemoteFX for Remote Desktop Virtualization Host Capacity Planning Guide for Windows Server 2008 R2 Service Pack 1

Executive Brief June 2014

z/vm 6.3 A Quick Introduction

StoneGate IPS. Hardware Requirements for Version 5.2.0

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

SPC BENCHMARK 2/ENERGY EXECUTIVE SUMMARY IBM CORPORATION IBM XIV STORAGE SYSTEM GEN3 SPC-2/E V1.4

ORACLE DATA SHEET ORACLE PARTITIONING

SUN ZFS STORAGE 7X20 APPLIANCES

IBM Infrastructure Suite for z/vm and Linux: Introduction IBM Tivoli OMEGAMON XE on z/vm and Linux

IBM System Storage LTO Ultrium 6 Tape Drive Performance White Paper

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved.

InfoSphere Data Replication CDC Troubleshooting

Microsoft Dynamics CRM 2011 Data Load Performance and Scalability Case Study

Jeremy Canady. IBM Systems and Technology Group ISV Enablement March 2013

Dell EMC SAP HANA Appliance Backup and Restore Performance with Dell EMC Data Domain

EMC Business Continuity for Microsoft Applications

DELL TM AX4-5 Application Performance

FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray

IBM Power Systems solution for SugarCRM

Data Analytics using MapReduce framework for DB2's Large Scale XML Data Processing

Dell PowerEdge R910 SQL OLTP Virtualization Study Measuring Performance and Power Improvements of New Intel Xeon E7 Processors and Low-Voltage Memory

IBM System Storage Reference Architecture featuring IBM FlashSystem for SAP landscapes, incl. SAP HANA

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

Performance and Scalability with Griddable.io

This PDF is no longer being maintained. Search the SolarWinds Success Center for more information.

iseries Job Attributes

OuterBay. IBM TotalStorage DR550 Performance Measurements. Updated as of February 14, IBM Storage Products

SurveOne. User Manual. Release 1.0

MobiLink Performance. A whitepaper from ianywhere Solutions, Inc., a subsidiary of Sybase, Inc.

iseries Tech Talk Linux on iseries Technical Update 2004

Version 15.0 [System Requirements Single Server]

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Introduction. Architecture Overview

SUMMARY OF RESULTS BENCHMARK PROFILE. server.

IBM Terms of Use SaaS Specific Offering Terms. IBM DB2 on Cloud. 1. IBM SaaS. 2. Charge Metrics

Storwize V7000 real-time compressed volumes with Symantec Veritas Storage Foundation

Implementing IBM Easy Tier with IBM Real-time Compression IBM Redbooks Solution Guide

IBM TotalStorage Enterprise Storage Server Model 800

Transcription:

Page 1 IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases (Version 10.2.1) Performance Evaluation and Analysis 2014 Prasa Urithirakodeeswaran

Page 2 Contents Introduction... 3 Configuration Test... 4 Test Methodology... 4 Test Variations... 4 Test 1 Single Subscription without Fast Apply... 4 Test 2 Single Subscriptions with Fast Apply... 4 Test 3 Two Subscriptions with Fast Apply... 5 Results... 5 Summary... 7 Row Length Test... 8 Test Methodology... 8 Tests... 8 Results... 9 Summary...10 Appendix B: CDC for DB2 Configuration...12 Appendix C: Formulas and Units...13 Appendix D: Table Definitions...14 Notices...18

Page 3 Introduction This paper describes the result of a performance study of IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases Version 10.2.1 and demonstrates the characteristics and behaviour of its data replication. The paper explores various tuning techniques and workload types and examines their affect on performance. The primary performance measurement used for comparison is the data throughput rate when replicating to and from DB2 LUW databases. Please note that the workload used in these tests is not representative of all workloads. The workload used in this test may not match your production workload. 1 1 Performance is based on measurements and projections using CDC benchmarks in a controlled environment. The results that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, the amount of CPU capacity available during processing, and the workload processed. Therefore, results may vary significantly and no assurance can be given that an individual user will achieve results similar to those stated here. These results should be used for reference purposes only. The test scenarios (hardware configuration and workloads) used in this document to generate performance data are not considered best performance case scenarios. Performance may be better or worse depending on the hardware configuration, data set types and sizes, and the overall workload on the system. The information contained in this document is distributed on an AS IS basis without any warranty either expressed or implied. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer s ability to evaluate and integrate them into their operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.

Page 4 Configuration Test There are many different ways to configure CDC. The configuration choices depend on the nature of the workload and the desired throughput versus the system resource footprint. These tests provide samples of a few configurations for given workloads and show their effect in increasing the overall throughput. Test Methodology The workload used for this test: Insert only workload 10 Inserts per commit 10 Tables 23 Columns per table with a variety of data types equaling 240 bytes per row. 2 Approximately five million rows in total workload InfoSphere IIDR 10.2.1 IF2 o One or more source instance (as described below) o Single target instance Parallelize by table fast apply mode used (as described below) o 10 image builders o number of apply threads equal to number of tables per subscription o 10,000 row threshold. Complete database and hardware specifications found in Appendix A: Machine and Environment Specifications and Appendix B: CDC for DB2 Configuration The test workload was generated in advance so that the test would not be influenced or constrained by the workload generation. Test Variations Three CDC configurations were used in order to maximize the throughput while varying the footprint and possible semantic application constraints. Test 1 Single Subscription without Fast Apply This test uses the simplest configuration of a single subscription without usage of the Fast Apply feature. This configuration supports the strictest level of workload transaction consistency and is the simplest to configure and maintain. Test 2 Single Subscriptions with Fast Apply This test uses a single source instance and a single subscription along with the Fast Apply product feature to parallelize the target database connection. This configuration can be used when reordered operations in the workload will not break data integrity constraints. 2 The detailed schema can be found in Appendix D: Table Definitions

Page 5 Test 3 Two Subscriptions with Fast Apply Test 3 demonstrates the effect of splitting the workload between two subscriptions in a single source instance in order to take advantage of shared scrape and parallelism in sending the data to the target, while also using Fast Apply to parallelize the target database connection. This configuration requires that the tables in the workload can be split between subscriptions without breaking data integrity constraints. Results Test 1 Test 2 Test 3 Rows Applied per second elapsed 26,685 67,698 111,747 MB Applied per second elapsed 6.1 15.49 25.58 MB Applied per CPU second 3 6.16 6.24 6.45 CPU Consumed per MB (seconds) 0.16 0.16 0.15 Throughput (RPS) 120000 100000 80000 RPS 60000 40000 20000 0 Test 1 Test 2 Test 3 When comparing these configurations in terms of throughput (rows per second), we see that each configuration yields a substantial gain in RPS, with the final configuration yielding replication rate of over 111,000 RPS. 3 CPU measurements include the main InfoSphere CDC processes. The CPU for the database itself is not included.

Page 6 Throughput Throughput 30 12 25 10 MB / Second 20 15 10 MB / Second 8 6 4 5 2 0 Test 1 Test 2 Test 3 0 1 2 2 3 The above charts show the throughput expressed as MB/second, with the second chart showing the difference in throughput between each pair of test configurations. The second chart shows us the gains from the second and third tests were of similar magnitude. In other words, parallelizing the target engine database connection through Fast Apply provided similar gains as splitting the subscriptions to parallelize components of the source engine. CPU Footprint 6.5 6.45 6.4 6.35 CPU Seconds / MB 6.3 6.25 6.2 6.15 6.1 Series1 6.05 6 Test 1 Test 2 Test 3 CPU Seconds/MB measures the efficiency by which the workload is replicated. Here we see that each configuration increased the CPU efficiency as well as increasing the total throughput.

Page 7 Summary These tests illustrate that CDC provides a variety of configuration options that can be tailored to match the workload and desired throughput. The choice of configuration can have a dramatic effect of the performance obtained. In this scenario, the difference between Test 1 and Test 3 was a 419% gain in throughput.

Page 8 Row Length Test The purpose of this test is to show how the CDC product scales with respect to row size. 4 Test Methodology The workload used for this test: Insert only workload 10 Inserts per commit 10 Tables 23 Columns per table with a variety of data types equaling 240 to 1200 bytes per row. 5 Approximately five million rows in total workload IIDR 10.2.1 IF2 o Single source instance o Single target instance o Single subscription Parallelize by table fast apply mode used 6 o 10 image builders o number of apply threads equal to number of tables per subscription o 10,000 row threshold. Complete database and hardware specifications found in Appendix A: Machine and Environment Specifications and Appendix B: CDC for DB2 Configuration The test workload was generated in advance so that the test would not be influenced or constrained by the workload generation. Tests All the tests were identical with the exception of the table schema which was modified to produce row lengths varying from 240 bytes to 1200 bytes. 4 There were many other axes that could have been examined (e.g., such as number of columns, varying data types, etc ) Row size was chosen as an illustrative example of the impact a specific workload may have on replication throughput. 5 The detailed schema can be found in Appendix D: Table Definitions 6 The Fast Apply product feature was used to ensure the target performance was as fast as possible. In several of the tests this feature may not have been necessary in order to consume the source workload and thus artificially added an additional resource footprint. However for the sake of simplicity, the feature was enabled for all tests.

Page 9 Results Row Size (bytes) 240 480 720 960 1200 Rows Applied per second elapsed 67,698 59,391 53,646 46,455 39,798 MB Applied per second elapsed 15.49 27.19 36.84 42.53 45.54 MB Applied per CPU second 7 6.24 11.68 16.80 21.13 24.72 CPU Consumed per MB (seconds) 0.160 0.086 0.059 0.047 0.040 Row Length Scalability (CPU Footprint) 0.18 0.16 0.14 CPU Seconds / MB 0.12 0.1 0.08 0.06 Series1 0.04 0.02 0 240 480 720 960 1200 Row Length (Bytes) As can be seen in the above chart, the CPU cost decreases as the row length increases, or in other words, bytes transferred becomes more efficient with respect to CPU. 7 CPU measurements include the main InfoSphere CDC processes. The CPU for the database itself is not included.

Page 10 Row Length Scalability 80000 70000 60000 Throughput (RPS) 50000 40000 30000 20000 10000 0 240 480 720 960 1200 Row Length (bytes) It is expected that the number of rows replicated will decrease as we increase the row size, due to the extra processing required for each row. Summary These tests demonstrate the significance of the replication workload and size to the overall throughput and CPU consumption. The number of rows replicated, as expected, decreases as the size of the workload increases, but when examining data size (MB) rather than row count, we see that the throughput is actually increasing and the overall data replication rate is increasing.

Page 11 Appendix A: Machine and Environment Specifications Source and Target Machines Utilized two IBM Power 740 Express servers with 256G RAM, 1000baseT network connection, 2X POWER7+ 3.55G 8 core CPUs (max 4 threads per core). DS 4700 SAN storage, with 300G 15K RPM FC disks. 2X300G mirroring for OS 3 RAID0 arrays with 3 300G disks for each array (san1, san2, san3). 1Xlocal RAID0 disk with 3 140G SAS disks. DB2 Database Version 10.5 utilized on both source and target.

Page 12 Appendix B: CDC for DB2 Configuration Used CDC for DB2 databases Version 10.2.1 with Interim Fix 2 applied. Configured to use 4 GB memory for each source instance, 4 GB of memory for the target instance and 1 GB disk for each instance. Set system parameter global_max_batch = 250

Page 13 Appendix C: Formulas and Units Item Rows Published per second Rows Published per CPU second MB Published per CPU second CPU Consumed per MB (second) Description Total Rows / Elapsed Time (seconds) Total Rows / CPU Consumed (seconds) (Row length * rows) / CPU Consumed (seconds) CPU Consumed (seconds) / MB

Page 14 Appendix D: Table Definitions The following table definition was used in the Configuration Test and in the 240 byte Row Length Test. CREATE TABLE "WP"."TB240" ( "CUSTNO" DECIMAL(10, 0) NOT NULL, "STATE" CHAR(2), "INT04" DECIMAL(4, 0) NOT NULL, "INT03" DECIMAL(4, 0) NOT NULL, "INT02" DECIMAL(4, 0) NOT NULL, "INT" DECIMAL(4, 0) NOT NULL, "DEC01" DECIMAL(31, 0), "DEC02" DECIMAL(25, 0), "DEC03" DECIMAL(17, 0), "CHR04" CHAR(17), "CHR03" CHAR(17), "CHR02" CHAR(17), "CHR" CHAR(17), "VCR010" CHAR(13), "VCR09" CHAR(13), "VCR08" CHAR(13), "VCR07" CHAR(13), "VCR06" CHAR(13), "VCR05" CHAR(13), "VCR04" CHAR(13), "VCR03" CHAR(13), "VCR02" CHAR(13), "VCR" CHAR(13) ) ORGANIZE BY ROW DATA CAPTURE CHANGES IN "USERSPACE1" COMPRESS NO;

Page 15 The following table was used in the 480 byte Row Length Test CREATE TABLE "WP"."TB480" ( "CUSTNO" DECIMAL(10, 0) NOT NULL, "STATE" CHAR(2), "INT04" DECIMAL(4, 0) NOT NULL, "INT03" DECIMAL(4, 0) NOT NULL, "INT02" DECIMAL(4, 0) NOT NULL, "INT" DECIMAL(4, 0) NOT NULL, "DEC01" DECIMAL(31, 0), "DEC02" DECIMAL(25, 0), "DEC03" DECIMAL(17, 0), "CHR04" CHAR(17), "CHR03" CHAR(17), "CHR02" CHAR(17), "CHR" CHAR(17), "VCR010" CHAR(37), "VCR09" CHAR(37), "VCR08" CHAR(37), "VCR07" CHAR(37), "VCR06" CHAR(37), "VCR05" CHAR(37), "VCR04" CHAR(37), "VCR03" CHAR(37), "VCR02" CHAR(37), "VCR" CHAR(37) ) ORGANIZE BY ROW DATA CAPTURE CHANGES IN "USERSPACE1" COMPRESS NO;

Page 16 The following table was used in the 720 byte Row Length Test CREATE TABLE "WP"."TB720" ( "CUSTNO" DECIMAL(10, 0) NOT NULL, "STATE" CHAR(2), "INT04" DECIMAL(4, 0) NOT NULL, "INT03" DECIMAL(4, 0) NOT NULL, "INT02" DECIMAL(4, 0) NOT NULL, "INT" DECIMAL(4, 0) NOT NULL, "DEC01" DECIMAL(31, 0), "DEC02" DECIMAL(25, 0), "DEC03" DECIMAL(17, 0), "CHR04" CHAR(17), "CHR03" CHAR(17), "CHR02" CHAR(17), "CHR" CHAR(17), "VCR010" CHAR(61), "VCR09" CHAR(61), "VCR08" CHAR(61), "VCR07" CHAR(61), "VCR06" CHAR(61), "VCR05" CHAR(61), "VCR04" CHAR(61), "VCR03" CHAR(61), "VCR02" CHAR(61), "VCR" CHAR(61) ) ORGANIZE BY ROW DATA CAPTURE CHANGES IN "USERSPACE1" COMPRESS NO;

Page 17 The following table was used in the 1200 byte Row Length Test CREATE TABLE "WP"."TB1200" ( "CUSTNO" DECIMAL(10, 0) NOT NULL, "STATE" CHAR(2), "INT04" DECIMAL(4, 0) NOT NULL, "INT03" DECIMAL(4, 0) NOT NULL, "INT02" DECIMAL(4, 0) NOT NULL, "INT" DECIMAL(4, 0) NOT NULL, "DEC01" DECIMAL(31, 0), "DEC02" DECIMAL(25, 0), "DEC03" DECIMAL(17, 0), "CHR04" CHAR(17), "CHR03" CHAR(17), "CHR02" CHAR(17), "CHR" CHAR(17), "VCR010" CHAR(85), "VCR09" CHAR(85), "VCR08" CHAR(85), "VCR07" CHAR(85), "VCR06" CHAR(85), "VCR05" CHAR(85), "VCR04" CHAR(85), "VCR03" CHAR(85), "VCR02" CHAR(85), "VCR" CHAR(85) ) ORGANIZE BY ROW DATA CAPTURE CHANGES IN "USERSPACE1" COMPRESS NO;

Page 18 Notices Copyright IBM Corporation 2014 All Rights Reserved. IBM Canada 8200 Warden Avenue Markham, ON L6G 1C7 Canada Neither this documentation nor any part of it may be copied or reproduced in any form or by any means or translated into another language, without the prior consent of the above mentioned copyright owner. IBM makes no warranties or representations with respect to the content hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. IBM assumes no responsibility for any errors that may appear in this document. The information contained in this document is subject to change without any notice. IBM reserves the right to make any such changes without obligation to notify any person of such revision or changes. IBM makes no commitment to keep the information contained herein up to date. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All performance data contained in this publication was obtained in the specific operating environment and under the conditions described above and is presented as an illustration only. Performance obtained in other operating environments may vary, and customers should conduct their own testing The information in this document concerning non-ibm products was obtained from the supplier(s) of those products. IBM has not tested such products and cannot confirm the accuracy of the performance, compatibility, or any other claims related to non-ibm products. Questions about the capabilities of non-ibm products should be addressed to the supplier(s) of those products. IBM, the IBM logo and InfoSphere are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. References in this publication to IBM

Page 19 products or services do not imply that IBM intends to make them available in all countries in which IBM operates.