XML: Changing the data warehouse

Similar documents
The strategic advantage of OLAP and multidimensional analysis

SCREEN COMBINATION FEATURE IN HATS 7.0

Data management for. smarter business outcomes

Massive Scalability With InterSystems IRIS Data Platform

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

[CONFIGURE NEW PAYMENT METHOD IN STORE FRONT]

Optimizing Data Transformation with Db2 for z/os and Db2 Analytics Accelerator

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

IBM Real-time Compression and ProtecTIER Deduplication

Information empowerment for your evolving data ecosystem

Virtualizing disaster recovery helps ensure business resiliency while cutting operating costs.

IBM LinuxONE Rockhopper

Data Warehousing and OLAP

Reliability and Performance with IBM DB2 Analytics Accelerator Version 4.1 IBM Redbooks Solution Guide

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

IBM InfoSphere Information Analyzer

Business Intelligence. You can t manage what you can t measure. You can t measure what you can t describe. Ahsan Kabir

IBM and Sirius help food service distributor Nicholas and Company deliver a world-class data center

Analytic Workspace Manager and Oracle OLAP 10g. An Oracle White Paper November 2004

Information Integration

Proven strategies for uncovering. cost savings with IBM DB2

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

Applying Analytics to IMS Data Helps Achieve Competitive Advantage

CASE STUDY EB Case Studies of Four Companies that Made the Switch MIGRATING FROM IBM DB2 TO TERADATA

Shine a Light on Dark Data with Vertica Flex Tables

IBM Z servers running Oracle Database 12c on Linux

Rocky Mountain Technology Ventures

Data Warehouse and Data Mining

Effective PMR Submission Best Practice. IBM Learn Customer Support

Evolution of Database Systems

The Future of Business Continuity & Resiliency

IBM Software IBM InfoSphere Information Server for Data Quality

QUALITY MONITORING AND

Managing XML for Maximum Return

IBM Rational Developer for System z Version 7.5

Deliver robust products at reduced cost by linking model-driven software testing to quality management.

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

HANA Performance. Efficient Speed and Scale-out for Real-time BI

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Oracle and Tangosol Acquisition Announcement

IBM Data Center Networking in Support of Dynamic Infrastructure

Using IBM Tivoli Storage Manager and IBM BRMS to create a comprehensive storage management strategy for your iseries environment

Data Analytics at Logitech Snowflake + Tableau = #Winning

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

Deploying an IBM Industry Data Model on an IBM Netezza data warehouse appliance

Introducing IBM Lotus Sametime 7.5 software.

Data Mining Concepts & Techniques

WebSphere Commerce Integration with ebay: Using the ebay SDK and Web Services

Delivering information you can trust June IBM InfoSphere Information Server: Simplify integration with unified metadata

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Guide Users along Information Pathways and Surf through the Data

Full file at

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS

The IBM Storwize V3700: Meeting the Big Data Storage Needs of SMBs

Behind the Glitz - Is Life Better on Another Database Platform?

Perform scalable data exchange using InfoSphere DataStage DB2 Connector

REPORTING AND QUERY TOOLS AND APPLICATIONS

IBM DB2 Analytics Accelerator for z/os, v2.1 Providing extreme performance for complex business analysis

Ten Innovative Financial Services Applications Powered by Data Virtualization

Luncheon Webinar Series June 3rd, Deep Dive MetaData Workbench Sponsored By:

WP710 Language: English Additional languages: None specified Product: WebSphere Portal Release: 6.0

zapnote Analyst: Jason Bloomberg

Remodel. New server deployment time is reduced from weeks to minutes

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Infor Lawson on IBM i 7.1 and IBM POWER7+

Collaboration for a Greener World. Kevin O' Connell Consulting Manager, Lotus Software, IBM Asia Pacific

Teradata Aggregate Designer

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

IBM Data Replication for Big Data

WebSphere Application Server, Version 5. What s New?

Networking for a smarter data center: Getting it right

ENTERPRISE DATA STRATEGY IN THE HEALTHCARE LANDSCAPE

Crystal Reports. Overview. Contents. How to report off a Teradata Database

REST APIs on z/os. How to use z/os Connect RESTful APIs with Modern Cloud Native Applications. Bill Keller

Empowering DBA's with IBM Data Studio. Deb Jenson, Data Studio Product Manager,

ETL and OLAP Systems

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

Oracle GoldenGate for Big Data

IBM DB2 Web Query for System i

Whitepaper. Solving Complex Hierarchical Data Integration Issues. What is Complex Data? Types of Data

Decision Support Systems

IBM Cloud Orchestrator. Content Pack for IBM Endpoint Manager for Software Distribution IBM

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

IBM TS7700 grid solutions for business continuity

How to Modernize the IMS Queries Landscape with IDAA

Executive Brief June 2014

Protect Your Data At Every Point Possible. Reduce risk while controlling costs with Dell EMC and Intel #1 in data protection 1

OBIEE Performance Improvement Tips and Techniques

IBM Aspera for Microsoft SharePoint

Paper. Delivering Strong Security in a Hyperconverged Data Center Environment

BI, Big Data, Mission Critical. Eduardo Rivadeneira Specialist Sales Manager

Smart Transformation. Smart Transformation. Ravi Indukuri IBM Commerce

Implementing IBM CICS JSON Web Services for Mobile Applications IBM Redbooks Solution Guide

Continuous Availability with the IBM DB2 purescale Feature IBM Redbooks Solution Guide

IBM InfoSphere Data Replication s Change Data Capture (CDC) Fast Apply IBM Corporation

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

Transcription:

IBM Software Group White Paper XML: Changing the data warehouse Deliver new levels of business analysis and bring users closer to their data

2 Deliver new levels of business analysis Executive summary Some saw it coming years ago. Others are just beginning to realize what lies ahead. We re talking about the impact that XML is having on data warehousing environments. For IT leaders building data warehouses that meet the evolving demands of their business environments, integration of XML data into their infrastructures is critical. With XML, organizations can support evolving business reporting and analysis requirements without incurring significant database schema changes or rewriting applications. Simultaneously, they can bring their users closer to the information that they need to make accurate business decisions while providing the user with a result set that is similar to a search engine on the Internet. XML has become the preferred data exchange format across many industries. As a result, organizations must find ways to efficiently manage and manipulate XML within their data warehouses. In fact, leading-edge firms in retail, finance, healthcare and other industries already have production environments that leverage both relational and XML database technologies. While their implementations vary, early adopters are often striving to promote greater business agility, providing decision makers with more accurate and timely information and improving IT staff productivity. Data warehouses and evolving business needs One of the primary goals of data warehousing is to make it as easy as possible for users to get the information that they need when they need it. However, traditional relational schemes can make this difficult to achieve. Consider a data warehouse in the retail industry that tracks sales information. Using a typical star schema database design, as pictured in Figure 1, a fact table might contain sales data by product, region and time period. Such data would typically be joined with data in multiple dimension tables to obtain specific details pertaining to various products, regions and so on. Unfortunately, different products have different attributes, which presents challenges both in database design and in figuring out how to expose the data to users. With a relational-only design, each product attribute must be captured in its own column. But because attributes vary widely from one type of product to another, the result could be an inefficient, unwieldy product table with thousands of sparsely populated columns. And, as new products and new product attributes are introduced over time, the database schema and any applications that depend on it would need to be altered, which can be quite costly. IBM DB2 purexml makes it possible for organizations to manage XML data as well as relational data. This increases database efficiency, improves the user experience and increases their competitive advantage by fully exploiting data interchange standards.

IBM Software Group 3 Augmenting a data warehouse schema with XML Time Store Time Store time_key day day_of_week month quarter year time Product product_key name class subdept dept Purchase time_key store_key product_key dollars_sold (Facts) store_key name district region division time_key day day_of_week month quarter year time Product product_key name class subdept dept Purchase time_key store_key product_key dollars_sold (Facts) store_key name district region division Details (XML) Figure 1: In a typical star schema warehouse, millions of transactions known as facts are in a purchase table. This table joins via keys to dimension tables that store attributes for time, store location and product (left). Using XML data warehousing methods, the organization captures additional product attributes as a single XML column in the product dimension table (right). Presenting this data to decision makers is a challenge in this environment. Programmers and database analysts must determine which attributes to expose to which decision maker s application or executive dashboards not an easy task. Also, business analysts may want to drill down or slice-and-dice product sales data in unexpected ways, such as exploring sales of women s shirts by size, color, fabric, neckline and sleeve length. But investigating warehouse data in this way requires an intimate knowledge of how the warehouse schemas are constructed. For example: Are leather coats filed under outerwear, fall/winter or designer collections? There s often no easy, efficient or effective way in today s table-based warehouses for developers to create a search function that works like a Web search: Type in a term and all relevant results are returned.

4 Deliver new levels of business analysis Extending a relational data warehouse schema with one or more XML columns avoids those problems. Commonly used attributes can be stored in relational columns, while additional details can be maintained in an XML column, which readily accommodates variable structures and is easily accessible for queries and reports. Using the previous example, the dimension table for product data could be extended with an XML column. New product attributes would simply be included as new elements in the appropriate XML documents. The database schema would not need to change. Although this example focuses on a retail scenario, similar scenarios apply to other industries. Medical records, financial instruments, employee profiles, traveler profiles and income tax data are just a few examples of information that has a wide range of possible attributes able to change over time. Real-time analysis in retail In the United States, a popular retailer captures point-of-sale data in XML and posts this data to message queues: These sales records represent important real-time business data that analysts need to review along with historical information about sales. Using a popular extract/transform/load (ETL) offering to transform this data and load it into a relational-only data warehouse proved time consuming to develop and failed to meet the firm s need for real-time analysis. Instead, this retailer built an operational data store using DB2 purexml. Stored procedures read transactional XML data from queues and store the data intact into a DB2 database. Analysts can now combine data about intra-day sales (managed by this operational data store) with historical data previously loaded into the data warehouse. Periodically, operational data is mapped into tables and loaded into the warehouse. The firm reduced their ETL development cycle by several months and provided real-time analysis capabilities to their business users for the first time.

IBM Software Group 5 Why XML? As more and more critical business data is captured and exchanged in XML, it s not surprising that firms are recognizing the need to manage, share, query and report on XML data. Message-based applications, Service Oriented Architectures (SOA), Web-centric applications and application integration projects increasingly rely on XML to define how important business data is represented and exchanged, and industry-specific XML standards abound. Examples include the Financial Products Mark-up Language (FpML) for over-the-counter derivatives trading, Health Level 7 (HL7) and Clinical Data Interchange Standards Consortium (CDISC) specifications for healthcare, Association for Cooperative Operations Research and Development (ACORD) specifications for insurance, Financial Information Exchange Markup Language (FIXML) for securities transactions, ISO 20022 for banking payments, Standards in Automotive Retail (STAR) for automotive manufacturing and others. The increased use of XML standards for data interchange creates storage and management challenges; the highly variable, nested structures that typify XML data are difficult to accommodate using traditional relational database techniques. One approach to storing XML in a traditional relational database is using large objects to manage XML, which prevents the database management system (DBMS) from understanding the internal structure of stored data. Therefore, it can t provide optimized access to specific XML elements or attributes contained within a message or document. Some firms prefer to shred or decompose XML data into multiple columns of one or more tables. In doing so, they flatten the XML hierarchy and convert the data values of XML elements and attributes into traditional SQL data types, such as an integer or varying-length character strings. But, a single XML message structure (XML schema) may need to be mapped to dozens, even hundreds of tables, driving up application development and database administration costs. Those complex, labor-intensive mappings are difficult to adjust as XML messaging formats change over time. Finally, reconstructing the data into an XML format for downstream applications can be slow and complex. For those reasons, and others, many firms are storing XML in its native hierarchical format alongside relational data so that both types of data may be managed in an optimal manner. With this approach, XML data is stored intact with full DBMS knowledge of its internal hierarchical structure. IBM DB2 supports native XML storage alongside its relational storage, which helps organizations manage, share, query and report on data modeled in tables as well as data contained in XML hierarchies. Labor-intensive document decomposition and reconstruction processes aren t needed. In addition, certain performance advantages and programming productivity enhancements are possible, thanks to greater DBMS knowledge of XML technology.

6 Deliver new levels of business analysis Improving productivity with XML A European financial services firm adopted a SOA to promote better customer service across its various lines of business (LOB). The firm defined a standard XML messaging format for data exchange among applications and needed to maintain an operational data store to manage this data. After comparing DB2 purexml to relational technologies, the firm concluded that it could improve the productivity of its Web service developers and database administrators through purexml. DB2 purexml IBM DB2 provides firms with a common application programming interface and database management platform for data modeled in tables as well as XML hierarchies. This hybrid database management architecture, pictured in Figure 2, helps firms to extend their traditional relational database environments to directly manage XML messages and documents without the need to shred data into columns of various tables. Applications can retrieve relevant portions of the XML data easily and efficiently, as well as integrate XML and relational data with little effort. Optimized storage Common services xquery Integrated information <xml> </xml> Language flexibility rdb SQL Figure 2: DB2 9 architecture with built-in support for relational and XML data helps extend traditional relational database environments.

IBM Software Group 7 DB2 purexml includes many sophisticated features to provide high levels of performance and scalability for queryintensive workloads common in data warehousing environments. Such features include: Cost-based query optimization helps enable DB2 to select an efficient path for accessing requested data Specialized XML indexing speeds retrieval of queries over XML data as well as queries over relational views of XML data Hash-based partitioning provides significant scalability gains Range-based partitioning helps firms roll in and roll out data over time (a common requirement in data warehouses) Multi-dimensional clustering often improves performance of analytic queries Compression of XML data and indexes reduces storage costs, improves storage efficiency and speeds runtime performance for many common workloads Recent benchmarks conducted by IBM and Intel confirm the strong runtime performance, throughput and efficiency of running DB2 purexml on Intel Xeon processor 5500 series. 1 In addition, DB2 helps enable firms to easily create relational views of their XML hierarchical data, which allows relational tools and applications to access stored XML data in a straightforward manner. Indeed, popular query/reporting tools, such as IBM Cognos, benefit from DB2 s ability to dynamically transform XML structures to relational result sets when needed. Finally, many extract/transform/load tools, such as IBM InfoSphere IBM DataStage and IBM Infosphere Warehouse Design Studio, can access purexml data through such views or through built-in wizards. Summary Increased use of XML as a preferred format for data exchange is prompting data architects and administrators to evaluate options for integrating business-critical XML data into their IT infrastructures. Sophisticated features for indexing, query optimization, compression and physical database design options provide for strong runtime performance and scalability. Also, various complementary software offerings can readily access purexml data, providing firms with an easy means to integrate purexml into their existing IT infrastructures. For more information For more information on DB2 purexml, visit ibm.com/software/data/db2/xml

Copyright IBM Corporation 2009 IBM Software Group Route 100 Somers, NY 10589 Produced in the United States of America December 2009 All Rights Reserved IBM, the IBM logo, ibm.com, Cognos, DataStage, DB2, InfoSphere and purexml are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at ibm.com/legal/copytrade.shtml Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates. Offerings are subject to change, extension or withdrawal without notice. All statements regarding IBM future direction or intent are subject to change or withdrawal without notice and represent goals and objectives only. The information contained in this document is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this document, it is provided as is without warranty of any kind, express or implied. In addition, this information is based on IBM s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this document or any other documents. Nothing contained in this document is intended to, nor shall have the effect of, creating any warranties or representations from IBM Software. 1 Intel and IBM Collaborate to Boost Performance and Lower Power Consumption download.intel.com/business/software/testimonials/downloads/xeon5500/ ibm_db2.pdf May 2009 Please Recycle IMW14238-USEN-01