IBM InfoSphere Data Replication s 11.3.3.1 Change Data Capture (CDC) Enhancements 2015 IBM Corporation
IBM Corporation 2015. All Rights Reserved. Disclaimer: Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. 2 2015 IBM Corporation
Overview The following is available in the IIDR 11.3.3.1 Release CDC Oracle Window Redo-log Recovery mechanism after Oracle failover Oracle Exadata ASM & Flex ASM Support WebHDFS Cloudant Apply Extension to DataStage Custom Flat File Formatter CDC i Refresh While active support MS SQL Server support of online index rebuild operations MC/AS Upgrade support (version 11.3.3) 3 2015 IBM Corporation
IBM InfoSphere Data Replication (IIDR) Coverage DB2 (i, LUW) Informix Oracle/Exadata MS SQL Server Sybase DB2 z/os DB2 z/os DB2 (z/os, i, LUW) Informix Oracle/Exadata MS SQL Server Sybase Pure Data for Analytics (Netezza) Teradata Information Server Cloudant IMS VSAM IMS VSAM Hadoop/Streams Message Queues Files FlexRep (JDBC targets) Customized Apply HDFS/Hive, WebHDFS, User Exit ESB, MQ Series, JMS, Flat file, HDFS MySQL, EnterpriseDB DataStage to GreenPlum, 4 2015 IBM Corporation
InfoSphere Data Replication - Expansive support SOURCES TARGETS O/S HARDWARE DB2 (z/os, i, LUW) All Sources 1 z/os System z Oracle/Exadata Pure Data for Analytics Red Hat / SuSE System z MS SQL Server Information Server AIX System p Informix Hadoop/Streams 2 IBM i OS System i Sybase Cloudant Red Hat / SuSE Intel / AMD/Power 4 IMS FlexRep (MySQL, EnterpriseDB) MS Windows Intel / AMD VSAM Teradata HP-UX HP- Itanium MQ Series / JMS HP-UX HP PA-RISC WebMethods / BEA / TIBCO Solaris Sun Sparc Greenplum 3 1. IMS is only a Target for IMS Sources. VSAM is only a Target for VSAM Sources 2. Via HDFS, WebHDFS or custom user exit 3. Via DataStage 4. Power-8 with Little Endian for DB2 LUW only 5 2015 IBM Corporation
New Database/Platform Support for IIDR s CDC CDC Oracle Windows Redo Supports Oracle Redo on Windows same as other platforms including Oracle RAC and ASM Supports all configuration modes supported by the Linux/Unix CDC Oracle versions such as local, remote and the various log shipping modes Supports MBCS which the trigger version did not 6 2015 IBM Corporation
Recovery mechanism after Oracle DataGuard failover Recovery mechanism after Oracle failover to DataGuard (DG) Standby database In a fail-over to a DG standby, the re-instantiation of the database results in a new incarnation CDC only supports reading logs from current incarnation of the database and will not see any unprocessed logs from previous incarnation Use new dmfailoverrecovery command to instruct CDC to scrape the required logs (if there was latency at time of failover) from the previous incarnation of the database The command only works for the previous incarnation of the database The dmfailoverrecovery starts mirroring with a scheduled end so replication will stop once the last log entry is read from the previous incarnation of the database If command does not succeed, then a refresh of the tables will be required This can occur for instance if the last log is corrupted 7 2015 IBM Corporation
Recovery mechanism after Oracle DataGuard failover dmfailoverrecovery command syntax <CDC_INSTALL_HOME>/bin>./dmfailoverrecovery -I <instance name> [-d -r ] Options -d: Displays the information the command will use for running the recovery process. It does not start the recovery itself. -r: Start the recovery process. Recovery time depends on the number of data that needs to be processed. If the recovery succeeds, users can then resume normal replication. if the recovery process fails, users will need to perform a full refresh of all tables to ensure data consistency. 8 2015 IBM Corporation
Oracle Exadata ASM & Flex ASM Support CDC can now be configured to operate directly from the Exadata appliance CDC can be locally installed on Exadata and seamlessly read from ASM If you wish to read the Exadata logs remotely, you will need to ship the logs to a non-asm location that is accessible to CDC The same user configuration experience is provided for both traditional Oracle database ASM configuration and ASM on Exadata 9 2015 IBM Corporation
Oracle Exadata ASM & Flex ASM Support * CDC now seamlessly supports ASM automatic rebalancing The new ASM support requires the user to add a tns entry to the tnsnames.ora file to point to the ASM instance Config tool modified to specify a tnsnames.ora location 10 2015 IBM Corporation
Oracle Exadata ASM & Flex ASM Support CDC now supports Oracle 12c Flex ASM Flex ASM essentially allows an Oracle RAC node to use ASM from other nodes instead of tying it to the ASM on its node For CDC to support, you must make minor changes to the tnsnames.ora. Eg. ASM = (DESCRIPTION = (ADDRESS_LIST = (FAILOVER = on) (ADDRESS = (PROTOCOL = TCP)(HOST = cdcrac2.torolab.ibm.com)(port = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = cdcrac3.torolab.ibm.com)(port = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = cdcrac1.torolab.ibm.com)(port = 1521)) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = +ASM) ) ) If one ASM instance is down it will automatically switch to the next one Specify the node names 11 2015 IBM Corporation
WebHDFS Support Overview New WebHDFS support utilizes Rest APIs Allows much greater flexibility on where the CDC target is installed CDC install no longer required to be part of the Hadoop cluster Added benefit that changes/upgrades of the Hadoop cluster will not impact the server where the CDC target engine is running Allows CDC to target any Hadoop distribution Removes the restriction on what underlying file system is being used As such, now supports replicating to Hadoop on GPFS Allows CDC to interact with a Hadoop install which is configured to use Kerberos security WebHDFS using Kerberos authentication has approximately the same throughput performance as using the CDC local HDFS option WebHDFS with simple authentication has lower performance 12 2015 IBM Corporation
WebHDFS Support Configuration New WebHDFS option is available when mapping tables 13 2015 IBM Corporation
WebHDFS Support Configuration Specify the name of the directory where the HDFS files will reside Note, the server name is specified in the Hadoop Properties 14 2015 IBM Corporation
Configuring Hadoop Properties for the Subscription WebHDFS Connection Information specified in the Hadoop Properties Supports Simple and Kerberos Authentication Supports both http and https Note that the fully qualified connection string must be supplied including the /webhdfs/v1/ Additional examples by Hadoop service for WebHDFS with default configuration: Through httpfs proxy : BigInsights 3.0 http://192.xxx.xxx.xxx:14000/webhdfs/v1/ https://192.xxx.xxx.xxx:14443/webhdfs/v1/ Through knox gateway : BigInsights 4.0 https://192.xxx.xxx.xxx:8443/gateway/default/webhdfs/v1/ Directly to HDFS namenode : rarely permitted in production http://192.xxx.xxx.xxx:50070/webhdfs/v1/ https://192.xxx.xxx.xxx:50470/webhdfs/v1/ 15 2015 IBM Corporation
Naming Convention of Files CDC uses the following convention to name the HDFS flat files that are produced during replication (_)[Table].D[Date].[Time][# Records] _ = Currently open HDFS file. Removed when completed [Date] = Julian date (year, day number within year) [Time] = hh24mmss when flat file was created (in GMT) [# Records] = Optionally the number of records can be added For those who are familiar with standard IIDR flat file production, there are some behavior difference with IIDR HDFS files compared with standard flat file production File prefix is different HDFS uses _ instead of @ for working file Fields are not quoted in files produced in HFDS HDFS doesn t create [Table].STOPPED file when subscription is stopped 16 2015 IBM Corporation
HDFS Record Format Standard columns containing information about the change: DM_TIMESTAMP - The timestamp obtained from the log of when the operation occurred (contains the value from the &TIMSTAMP journal control field) DM_TXID - Transaction identifier (contains the value from the &CCID journal control field) DM_OPERATION_TYPE contains a single character indicating the type of operation: "I" for an insert. "D" for a delete. For Single Record Format there is one type that represents the update image "U" represents an update. For Multiple Record Format there are two separate types that represent before and after image "B" for the row containing the before image of an update. "A" for the row containing the after image of an update. DM_USER - The user that performed the operation (contains the value from the &USER journal control field) 17 2015 IBM Corporation
HDFS Record Format Single record In this format an update operation is sent as a single row The before and after image is contained in the same record E.g. Inserting 1 row followed by Deleting 1 row 2015-07-15 22:09:46,6674163,I,GSAKUTH,\N,\N,\N,4381 Kelly Ave,San Jose,CA 2015-07-15 22:09:47,6674174,D,GSAKUTH,4381 Kelly Ave,San Jose,CA,\N,\N,\N Multiple record format An update operation is sent as two rows, the first row being the before image and the second row containing the after image. Note that the following characters will be escaped: Comma: escaped with \ Escape: escaped with \\ Null: escaped with \N (as illustrated in the example above) Binary Data is encoded in base64 Sample customer formatter (SampleDataFormatForWebHdfs.java) is provided with product if customization of output format required 18 2015 IBM Corporation
Cloudant Target Support New CDC target engine that applies directly to Cloudant Receives changed data based on relational tables and transforms the data to equivalent JSON documents Utilizes the existing CDC DataStage target engine infrastructure New Cloudant delivery method is available 19 2015 IBM Corporation
Cloudant Target Support Cloudant URL and login credentials are provided via the subscription properties dialog The connection to Cloudant is secure utilizing HTTPS 20 2015 IBM Corporation
Cloudant Target Support Indicates which Cloudant database to apply to customer Select Parent Table for compound documents 21 2015 IBM Corporation
Cloudant Target Support Full CHCCLP scripting support available. E.g. add table mapping [name] Specifies the name of the subscription. If a name is not provided, the subscription that is currently identified as the context will be used. To view the current context, use the "show context" command. [sourcedatabase] Database for the source table. sourceschema Schema for the source table. sourcetable Name of the source table. type Table mapping type. VALID VALUES: cloudant cloudantdatabase Name of the Cloudant database. primarykeycolumns Set of columns comprising the primary key of the source table. [parentschema] Schema of the parent table in the source database. [parenttable] Name of the parent table in the source database. add table mapping sourceschema cdcschema sourcetable Invoices type cloudant cloudantdatabase inv primarykeycolumns "inv_number" 22 2015 IBM Corporation
Cloudant Apply Detail A Changed Row (insert/update/delete) in the source table will be replicated to a JSON document in Cloudant In the JSON document, there will be a document id ( _id ) based on the key of the source table (internal relationship) INSERT into CDCSCHEMA.TABLE_1 values (25, 94401,.) The _id is used to resolve to the document in Cloudant Changed Data Capture 23 2015 IBM Corporation
Cloudant Apply Detail Apply behavior is to replicate source data regardless of the document existence in Cloudant The apply mode is conceptually similar to adaptive apply 24 2015 IBM Corporation
Cloudant Apply: Compound JSON Documents Designate Parent Child Relationships Compound documents are created on the fly Card Holder Parent PROFILE Key CARD_NUM Child TRANASCTION Key CARD_NUM - TRANS_ID Source Database Multiple Transactions (Repeating Elements in the Parent document) Cloudant Doc in ODS 25 2015 IBM Corporation
Extended Custom Flat file formatter Allows users to write custom user exits that supports customizing temporary flat file and hardening flat file names Supports customization for DS flat file generated on both LUW and Hadoop file systems (HDFS) Allows users to suppress update before images to be logged in the flat file When before images are suppressed contents of single file and multiple file mode will be similar 26 2015 IBM Corporation
Extended Custom Flat file formatter Four new methods for extended custom formatter getcontextfordatastageextendedfilecharacteristicsif This is called once for each table mapping, just before the first operation being processed by the apply for that table. The returned context object will be provided to all subsequent calls to the following methods: gethardenedfilename gettempfilename assumenokeychangesoccur gethardenedfilename Used to provide the fully qualified name to use for the hardened file. This method will be called just before the file is to be hardened after all the data has been written to it. gettempfilename Used to provide the fully qualified name to use for the temporary file. This method will be called just before the first data is to be written to this temporary file. assumenokeychangesoccur The file will be written as if the Multiple Record option were selected with the Update Before Image rows not included regardless whether "Single Record" or "Multiple Records" are selected. Note: If any update operations change the key columns being used by the application consuming these files that application will not be able to maintain an accurate copy of the data. 27 2015 IBM Corporation
CDC i Refresh While active support Now CDC i has the option to acquire a lock for a very short period of time before refreshing the table to ensure that a mirroring point is established in the log (journal) As such, you can now do a refresh while active (*RWA) and respect the commit boundaries on the target apply For example on LUW target engines the system parameter mirror_commit_on_transaction_boundary no longer needs to be set to false 28 2015 IBM Corporation
MS SQL Server support of online index rebuild operations CDC now transparently handles index rebuilds and reorgs on tables that CDC is replicating Thus, if no structural table change was performed before the index rebuild or reorg, mirroring will just continue normally Previously users were required to following the manual DDL procedures as replication would have ended when the index rebuild or reorg was encountered in the log 29 2015 IBM Corporation
MC/AS Upgrade support The latest IIDR 11.3.3 Management Console build (5101) allows the user to perform an upgrade You simply upgrade Management Console by installing a later version of the software over top of an existing 11.3.3 installation Similarly, the latest IIDR 11.3.3 Access Server build allows the user to perform an upgrade You simply upgrade access server by installing a later version of the software over top of an existing 11.3.3 installation 30 2015 IBM Corporation
Additional Resources IBM Developer Works CDC community: https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/community view?communityuuid=a9b542e4-7c66-4cf3-8f7b-8a37a4fdef0c IBM CDC Knowledge Center: http://www-01.ibm.com/support/knowledgecenter/sstrgz_11.3.3/ CDC Redbook: http://www.redbooks.ibm.com/redbooks.nsf/redbookabstracts/sg247941.html?open IBM CDC Support: http://www- 947.ibm.com/support/entry/portal/product/information_management/infosphere_change_ data_capture?productcontext=-873715215 Passport Advantage: https://www-112.ibm.com/software/howtobuy/softwareandservices/passportadvantage 31 2015 IBM Corporation
32 2015 IBM Corporation
Legal Disclaimer IBM Corporation 2015. All Rights Reserved. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete: Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete: All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM Lotus Sametime Unyte ). Subsequent references can drop IBM but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server). Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the or symbol. Do not use abbreviations for IBM product names in your presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both. If you reference Adobe in the text, please mark the first use and include the following; otherwise delete: Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. If you reference Java in the text, please mark the first use and include the following; otherwise delete: Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. If you reference Microsoft and/or Windows in the text, please mark the first use and include the following, as applicable; otherwise delete: Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. If you reference Intel and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete: Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. If you reference UNIX in the text, please mark the first use and include the following; otherwise delete: UNIX is a registered trademark of The Open Group in the United States and other countries. If you reference Linux in your presentation, please mark the first use and include the following; otherwise delete: Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration purposes only. 33 2015 IBM Corporation