Oracle Data Integration and OWB: New for 11gR2 C. Antonio Romero, Oracle Corporation, Redwood Shores, US Keywords: data integration, etl, real-time, data warehousing, Oracle Warehouse Builder, Oracle Data Integrator, OWB, ODI, knowledge modules, code templates, CDC, Streams, OBI-EE, Advanced Queues Introduction This paper is intended to address the latest data integration offerings from Oracle, in particular as they pertain to Oracle database customers. New features in Oracle Database 11gR2 and especially Oracle Warehouse Builder (OWB) 11.2, delivered with the database, make possible new techniques for customers who take up OWB in ODI-EE. It will review how OWB 11.2 enhancements fit with evolving enterprise data integration requirements, and how Oracle Data Integrator (ODI) technology has been incorporated into OWB. It will then describe some specific data integration scenarios where the new features deliver technical and business benefits to customers who upgrade to OWB 11.2. Enterprise-Grade Data Integration Requirements Enterprise data integration requirements present a moving target, where what were advanced use cases in the past are now common. A complete review of current trends would be beyond the scope of this paper, but our customers face the following pressures: o Ever-increasing data volumes and smaller batch windows o An ever-larger number, diversity and complexity of systems to be integrated o More demand for near-real-time and event-driven data o The need to turn integrated data into actionable business intelligence o Inevitable cost pressures Viable enterprise data integration solutions must therefore have the following capabilities: o Faster, simpler integration with heterogeneous databases, legacy data, and flat file data sources, including flexibility to accommodate new sources o Multiple data delivery modes: batch ETL, trickle-feed data acquisition and delivery, and CDC o Highly productive development environments that let ETL developers focus on logical design, rather than details of execution, data movement and storage
o Rich metadata for objects and transformations, shared across tools in ways that add business value Warehouse Builder, Oracle Data Integrator and ODI-EE Oracle Data Integrator (ODI) and Oracle Warehouse Builder (OWB) are intended to address these underlying business requirements, in somewhat different use cases. Combining them in a single package, Oracle Data Integrator, Enterprise Edition, enabled Oracle to deliver this business value to a wide range of Oracle customers with differing technical requirements. With Warehouse Builder 11.2, many technical and functional benefits previously specific to ODI are now more readily available to customers for whom Warehouse Builder is an overall better fit, and who previously might have considered using the two tools together. Those with an existing investment in OWB designs or skills, or a preference for the end-to-end data warehouse design and ETL capabilities of OWB, can now choose OWB without compromising new technical requirements. OWB s full data warehousing and data integration functionality in the ODI-EE offering and is still linked to the Oracle database. OWB will continue to be a viable implementation choice for years to come. Understanding OWB 11.2 s Major Enhancements OWB has extensive new features and improvements. The ones that generate the most buzz are those that align it with ODI s capabilities: o Template-based code generation known as knowledge module (KM) technology o Native, heterogeneous database connectivity o EL-T in non-oracle databases o SOA integration, including exposing ETL tasks as Web services and invoking Web services from within process flows Others, while not paralleling ODI functionality, are just as significant to many customers: o Oracle Business Intelligence, Enterprise Edition (OBI-EE) metadata integration o Support for trickle-feed mappings using Advanced Queues Finally, there are changes to the OWB product and surrounding experience meant to get more information about OWB capabilities into customers hands. Code Template Mappings: OWB Meets ODI Oracle Data Integrator s distinctive knowledge module functionality, a framework that uses code templates to describe steps in various data integration processes, provide the basis for its flexible support for diverse integration patterns, data movement technologies, and heterogeneous sources and targets.
Warehouse Builder 11.2 adds code template-based code generation to provide the same range of capabilities ODI KMs provide, and can generally use KMs from ODI to let Oracle database customers integrate with heterogeneous sources. However, there are differences in terminology, in the internal implementation, and in the tools available for constructing, managing and applying code templates and mappings in OWB, that are more consistent with OWB s architecture and mapping paradigm and more familiar to OWB users. This discussion should clarify how the two technologies fit together. Comparing ODI KMs and OWB Code Templates Warehouse Builder refers to code templates (CTs) rather than knowledge modules. (One major reason for this: the word module is already used in OWB, and confusing terminology such as knowledge module modules could have resulted.) Existing ODI knowledge modules can generally be imported into Warehouse Builder without change; for each knowledge module imported, an analogous OWB code template is created. These code templates come in most of the same varieties as ODI KMs: Load CT (to get data from sources and move it to where it can be integrated), Integration CTs (which do the work of integrating staged intermediate results into a target), Change Data Capture CTs (like ODI s Journaling KMs, used in near-real-time data integration), Control CTs (like ODI s Check KM, used to filter bad rows out before loading) and so on. OWB Code Template Mappings: Extending the Paradigm OWB code template mappings (previously called Knowledge Module mappings) follow the familiar paradigm from classic OWB mappings, with some extra details added. To understand how this works, consider what a classic mapping really is: a logical-level description of data movement and transformation that does not fully describe the details of the data movement and integration process. (This is especially true when you get to cube and dimension loading operators, which embed a lot of complex ETL logic.) OWB generated code from classic mappings to express the logic described, given the version of the target database and many other properties of the mapping. Internally a sort of template mechanism was used, but it was never exposed to the user, and there was no way to change templates. Code template mappings make explicit much that was implicit in classic mappings. Design your mapping logic with the full range of operators that OWB has long supported. Then (and this is the new design step) you specify the data movement and integration mechanisms to use at the physical level. Code template mappings, then, are less of a departure for the OWB ETL developer than one might expect (or fear). The principal differences are: o A new, open code templating mechanism, based on the one in ODI, can be used for some mappings (although classic mappings continue to be available)
o The user can explicitly specify which parts of maps are executed on which systems in their landscape and what templates are used to generate code This separation of the physical and logical is in keeping with the ideal of declarative design, which has been embodied by both OWB and ODI, if in different ways. The details of this and the power it provides becomes more clear with an example. Logical and Execution Views of a CT Mapping The mapping editor for code template mappings presents two views of the mapping: o The Logical View is the same as the familiar editor for classic mappings, and describes the logic of how data is to be integrated. o The Execution View is where you identify where code executes, and select code templates used to generate code for different execution steps. The logical view of a code template mapping is essentially indistinguishable from a classic mapping. For example, this mapping loads a Promotions dimension for a Sales cube using OWB s dimension operator: Figure 1
Note the Execution View tab at the bottom. Click it to see the mapping s execution view: Figure 2 The boxes EXUNIT_SRC and EXUNIT_TGT group the mapping operators together into execution units, which identify a set of steps for which code will be generated together. Each execution unit is associated one or more code templates (that are used as the basis for code generation) and a location (where execution takes place). In this example, the EXUNIT_SRC execution unit represents two source tables and a join operation. Because the JOINER is in the execution unit, a join operation will be performed on the location set for the execution unit. The output of the join is staged in temporary tables on the target database associated with the EXUNIT_TGT execution unit. The dimension loading executes there, using the staging tables as sources behind the scenes. Each execution unit must then be assigned a code template (=KM) that specifies how code for those integration steps is generated. There are naming conventions for the available code templates shipped with OWB; here the selected code template for EXUNIT_SRC is LCT_SQL_TO_SQL ( Load Code template, generic SQL to SQL ). EXUNIT_TGT has its own specified.
One Logical Design, Many Implementations and Topologies For a single logical design, by selecting different execution units, code templates and locations to which code is deployed, you can cause OWB to generate drastically different code to accommodate different scenarios. Simply changing the code template on the source execution unit accommodates the following cases: o Source tables are on the same database, which could be any JDBC-compliant source, accessed through JDBC (which is the example shown) o Source from SQL Server, generate a script for bulk data unload and then bulk data load on the Oracle target o Source from Oracle database, use Data Pump to unload data on the source and load it into the Oracle target The following changes would require grouping the operators into different execution units: o Move the JOIN operator to the target execution unit, and the source table contents will be moved to the Oracle target separately and joined there. The join processing can thus be moved among different databases as needed based on workload, data volumes and so on. o Split the existing source execution unit in two, and you can accommodate source tables hosted on different databases by picking different code templates and locations. The join could be performed on any of these databases. o Split the source execution unit in two, and one or both sources could even be flat files or XML files, again based on the selected template. Again, no logical level changes are required your logical mapping design is identical to the one you created in a classic OWB map. Finally, with OWB s multi-configuration feature, for a single code template mapping you can specify completely different execution view for each configuration group operators into different execution units, and select different CTs. thus, when moving from development to test to production, you can prove out your logical mapping design purely against Oracle, with a completely different execution strategy in test and production. Complex Transformations in CT Maps: The Oracle Target CT So far this mostly sounds like ODI s data integration capabilities except for that dimension loading operator. So if the code generation is based on ODI templates, how does a code template map get to use operators not supported by ODI? A backdoor of sorts has been added to the code template system. Select the target execution unit in our example mapping, and note the following:
Figure 3 Note the selected code template for the target execution unit: the built-in default Oracle Target CT. At code generation time, OWB recognizes this template and for the operators in that execution unit, it uses classic OWB code generation. This allows the creation of what we call hybrid maps. You use ODI-like integration mechanisms for the roles to which they are best suited: flexible connectivity/data movement, performing filtering, joining and other straightforward transformations wherever works best in your topology. Then you land the results of that initial processing in Oracle for final integration, using the full range of OWB operators for heavy duty data warehouse transformation dimension and cube loading operators, match-merge operator, pivots, and so on within the execution unit with the Oracle Target CT. Hybrid maps and the explicit split between logical and execution-level views of your mappings are the biggest new conceptual shift in OWB 11.2. (It takes a little while to get your head around it, but once you do, it s pretty cool.) Cut-and-Paste Migration of Existing Mappings to CT Framework Converting existing Oracle-to-Oracle ETL mappings into code template mappings is actually quite simple from a UI perspective. o Create a new code template mapping module if you don to have one already. o Open the module containing your classic mappings, select one or more mappings to copy, and choose Copy. o Paste them into the desired code template mapping module. o Open each mapping, go to the execution view, and make sure the default execution units are set satisfactorily. If you click the Default Execution Units icon. OWB will create execution units around objects such that objects in different locations are in separate execution units, and reasonable default code templates are attached to each execution unit. Any operators that only work in OWB are in an execution unit associated with the Oracle Target CT. o Set any other required parameters for the selected CTs.
This is a reasonable first solution that should produce a working mapping. Users can migrate a small number of critical mappings initially to code templates where the most value is added or just for experimentation, and then later move more mappings in a more systematic way (or ignore those where there is little to be gained by switching). OWB Code Templates and Maintainability The OWB UI for building code templates is based on the familiar OWB Expert framework. OWB code templates can be nested and combined, much as nested experts can be constructed. This makes it easier to build and maintain code templates implementing complex behaviors. For example, one powerful data integration pattern available in both ODI and OWB 11.2 is creating a Load KM or Load CT that uses a native unloader on a source platform to extract data, moves the extracted file to an Oracle host by FTP, and loads the result into staging tables on the destination host. Most of the steps in this process are similar regardless of the source platform; the chief variation would be in the bulk unload command for the source and the loading command on the staging host. In OWB 11.2 one can build a framework of basic Load/Integration CTs that implement the common steps, then invoke those from more complex, platform-specific CTs. This improves maintainability and development efficiency improvements in the code templates in the framework can be leveraged by all mappings using any of the Load CTs. OBI-EE Metadata Integration OWB now supports integration with Oracle Business Intelligence Suite Enterprise Edition (OBI EE). This integration includes: o Derivation of ready-to-use physical, business model and presentation layer metadata for OBI EE from a data warehouse design o Visualization and maintenance of the derived objects from within OWB o Deployment of the derived objects in the form of an RPD file that can be loaded into OBI EE. o Inclusion of the derived objects in OWB data lineage and impact analysis, such that data lineage of objects in OBI EE reports can be traced down to the individual column level. Warehouse Builder can derive definitions for logical tables, catalog folders and dimension drill paths from your data warehouse design metadata, or you can create those definitions yourself. For OBI-EE, you can derive metadata from both Oracle and non-oracle database objects. A wizard walks you through the derivation process. At deployment time, these definitions are published in a UDML file, which can be saved to the file system or pushed out to a remote location by FTP or HTTP/HTTPS. On the OBI-EE end, the user must still convert the UDML file and import the derived objects into OBI-EE or merge them with an existing design.
This feature can shorten time-to-value if you are using OBI EE and Warehouse Builder together, and we encourage OBI-EE customers to experiment with it. Trickle Feed Mappings with Advanced Queues OWB now supports a continuous trickle feed loading approach by introducing the concept of trickle feed mappings. (In other words, advanced queues are back!) This is a feature which has showed up in past betas but has never before been included in a shipping version of OWB. Note that trickle feed mappings are OWB classic mappings, and do not depend upon ODI-related functionality. A trickle feed mapping is a mapping with an operator designated as the trickle feed driver representing the source from which data is trickling in. This is based on the Oracle Streams feature in the Oracle database server. The only operator type considered as candidate for the trickle feed driver is the Oracle Advanced Queue (AQ) operator. The rest of the mapping represents the transformation and loading to be performed on each message/event that trickles in from the AQ source. Such a message could be originating from an application publishing the message, or from Streams CDC. OWB can trigger execution of a trickle feed mapping on the arrival of a message in the driver, thereby providing continuous processing. The Streams framework in the Oracle database server provides the infrastructure to implement trickle feed mappings. The Streams framework supports the concept of an Apply daemon that is constantly listening for arrival of messages in a Streams Queue (a secure, transactional Advanced Queue). On arrival of a message, the Apply process passes control to a user-defined apply handler procedure, with the message as input. In OWB trickle feed mappings, the code generated from the OWB mapping is used as the apply handler. OWB and the Web: Updates, Blog, and LinkedIn The Help Menu in Design Center provides two of the major new features in OWB 11.2: o Check for Updates mechanism, through which the OWB team will deliver new code templates, utilities and experts, and links to important resources and announcements. This value-added content is independent of actual formal patches to OWB or the database. o The Warehouse Builder Start Page (available from the Help menu) which links to key getting started -type documentation topics and the Warehouse Builder Blog (http://blogs.oracle.com/warehousebuilder). Frequently updated and one of the most widely read blogs at Oracle, the OWB blog runs deep-dive articles on OWB features that even experienced users may not have discovered or fully understood. OWB 11.2 new features and techniques will be covered at length in the coming months. To get the most out of OWB 11.2, follow it closely.
The OWB/ODI product management team also runs a LinkedIn Group (http://www.linkedin.com/groups?about=&gid=140609) for OWB and ODI users. The group currently has nearly 1000 members, including partners, consultants, developers, and product managers. Join the community to keep up with OWB and ODI. Contact: C. Antonio Romero Oracle Corporation Redwood Shores, US Telephone: +1 650-607-6266 E-Mail: antonio.romero@oracle.com