A Guide to Using Cisco Data Virtualization

Similar documents
Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Fast Innovation requires Fast IT

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Composite Data Virtualization Maximizing Value from Enterprise Data Warehouse Investments

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Third generation of Data Virtualization

Data Mining: Approach Towards The Accuracy Using Teradata!

WHITEPAPER. MemSQL Enterprise Feature List

Data Virtualization Implementation Methodology and Best Practices

PERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract

Composite Data Virtualization Eight Ways Composite Data Virtualization Adds Value to Enterprise Data Warehousing

Intro to BI Architecture Warren Sifre

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

How to integrate data into Tableau

IBM Software IBM InfoSphere Information Server for Data Quality

Integration With the Business Modeler

Data Management Glossary

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Distributed Hybrid MDM, aka Virtual MDM Optional Add-on, for WhamTech SmartData Fabric

Introduction to Data Science

Building a Data Strategy for a Digital World

Accelerate Your Enterprise Private Cloud Initiative

Top 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software

The Emerging Data Lake IT Strategy

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

DATACENTER SERVICES DATACENTER

Full file at

Oregon SQL Welcomes You to SQL Saturday Oregon

White Paper. Major Performance Tuning Considerations for Weblogic Server

Introduction to Federation Server

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

Drawing the Big Picture

Lambda Architecture for Batch and Stream Processing. October 2018

TIBCO Data Virtualization for the Energy Industry

Simplifying your upgrade and consolidation to BW/4HANA. Pravin Gupta (Teklink International Inc.) Bhanu Gupta (Molex LLC)

Q1) Describe business intelligence system development phases? (6 marks)

Whitepaper. Solving Complex Hierarchical Data Integration Issues. What is Complex Data? Types of Data

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality

MetaMatrix Enterprise Data Services Platform

Overview SENTINET 3.1

metamatrix enterprise data services platform

Enterprise Data Architecture: Why, What and How

Optimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics

Designing High-Performance Data Structures for MongoDB

WEBMETHODS AGILITY FOR THE DIGITAL ENTERPRISE WEBMETHODS. What you can expect from webmethods

BI Moves Operational - The Case for High-Performance Aggregation Infrastructure

IBM WebSphere Message Broker for z/os V6.1 delivers the enterprise service bus built for connectivity and transformation

Teradata Aggregate Designer

The Value of Data Modeling for the Data-Driven Enterprise

A Single Source of Truth

EBOOK. NetApp ONTAP Cloud FOR MICROSOFT AZURE ENTERPRISE DATA MANAGEMENT IN THE CLOUD

Realizing the Full Potential of MDM 1

Delivering a 360 o View in Healthcare and Life Sciences With Agile Data

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Essential Features of an Integration Solution

SYSPRO s Fluid Interface Design

Informatica Data Quality Product Family

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

Sentinet for BizTalk Server SENTINET

Advanced Solutions of Microsoft SharePoint Server 2013 Course Contact Hours

Advanced Solutions of Microsoft SharePoint 2013

The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications

FINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

Efficiency Gains in Inbound Data Warehouse Feed Implementation

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services

Informatica Enterprise Information Catalog

Information empowerment for your evolving data ecosystem

The 360 Solution. July 24, 2014

Extending the Value of MDM Through Data Virtualization

An Introduction to Big Data Formats

Cisco Gains Real-time Visibility in the Business with SAP HANA

OpenIAM Identity and Access Manager Technical Architecture Overview

Data Vault Brisbane User Group

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Qlik Sense Enterprise architecture and scalability

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability.

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Capability White Paper Straight-Through-Processing (STP)

The Seven Steps to Implement DataOps

Global Reference Architecture: Overview of National Standards. Michael Jacobson, SEARCH Diane Graski, NCSC Oct. 3, 2013 Arizona ewarrants

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

Why Converged Infrastructure?

The 7 Habits of Highly Effective API and Service Management

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Solving the Enterprise Data Dilemma

Data Center Management and Automation Strategic Briefing

August Oracle - GoldenGate Statement of Direction

Hybrid Data Platform

Modern Data Warehouse The New Approach to Azure BI

CHAPTER 3 Implementation of Data warehouse in Data Mining

SAS IT Resource Management 3.8: Reporting Guide

CA ERwin Data Profiler

Microsoft SharePoint Server 2013 Plan, Configure & Manage

#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.

Introduction to K2View Fabric

Welcome to the Gathering Intelligence from your Applications and Data: The case for Oracle BI eseminar

Why Converged Infrastructure?

Modernizing Business Intelligence and Analytics

Transcription:

A Guide to Using Cisco Data Virtualization TONY YOUNG, SOLUTION ARCHITECT CISCO ADVANCED SERVICES SEPTEMBER 2015 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 31

TABLE OF CONTENTS INTRODUCTION... 3 PURPOSE... 3 HOW THIS DOCUMENT IS STRUCTURED... 3 PART 1: BACKGROUND... 4 DATA INTEGRATION STRATEGIES... 4 DATA VIRTUALIZATION... 4 CHALLENGES ADDRESSED BY DATA VIRTUALIZATION... 5 BENEFITS OF DATA VIRTUALIZATION... 5 DRAWBACKS OF DATA VIRTUALIZATION... 6 PART 2: WHEN TO USE DATA VIRTUALIZATION... 7 INDUSTRY USAGE PATTERNS... 7 BUSINESS INTELLIGENCE... 8 DATA WAREHOUSE... 11 MASTER DATA MANAGEMENT... 17 INFORMATION SERVICES... 19 OPERATIONAL DATA MANAGEMENT... 19 ENTERPRISE ARCHITECTURE... 19 MISCELLANEOUS... 20 INDUSTRY RECOMMENDATIONS... 22 TECHNICAL CONSIDERATIONS... 24 PART 3: HOW TO USE THE CISCO DATA VIRTUALIZATION PLATFORM... 26 VISION AND STRATEGY FOR CISCO DATA VIRTUALIZATION... 26 CONCLUSION... 29 RESOURCES... 29 INFORMATION SOURCES... 29 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 2 of 31

Introduction Purpose The purpose of this document is to provide architects, analysts, and developers with guidance for when to use data virtualization, and how to implement it using the Cisco Data Virtualization Platform. The goal is to provide the audience with sufficient insights into the technology so they can make informed decisions on when and how to use it. The approach in this guide is to present the technology in a tool-free context, followed by a discussion of how Cisco Data Virtualization should be used. First, the question of when to use is answered by examining the use cases, patterns, and best practices, in order to provide an understanding of when data virtualization is an effective solution. Next, the question of how to use is answered by explaining the architectural vision, and describing a methodology for applying the tool, so that the audience gains insights into how Cisco Data Virtualization should be used. How this Document is Structured The contents are divided three parts: Part 1 provides a brief background of data virtualization, ETL, and ESB, technologies Part 2 describes when to use data virtualization by providing use cases, patterns, and best practices Part 3 prescribes how Cisco Data Virtualization should be used to get the most value

PART 1: BACKGROUND Data Integration Strategies Data integration is the process of combining data residing in different sources, and providing users with a unified view of this data 1. This process includes not only the translation of complex differences between source and target data, but also dealing with differences in storage structure, storage technology, and location of the data. The following are the main data integration strategies available to manage these differences: Data Consolidation Used for capturing data from multiple sources and loading it to a centralized persistent store. Usually involves movement of data from native sources to a central store via a process of extraction, transformation, and loading (ETL). Examples of data consolidation include movement of data into a data warehouse using tools such as Informatica or DataStage for ETL. Data Synchronization Used for propagating changes in data between data sources, and targets, so that consistency is maintained between the systems. Typically only altered content is communicated between the source and target systems. Examples of data synchronization include receiving JMS messages to process new records on backend systems, and sending requests for data to downstream systems. Data synchronization most often makes use of Enterprise System Bus (ESB) technology such as Tibco or MuleSoft. Data Virtualization Used to provision data on demand, in near real-time, from disparate sources, and delivered through a business-friendly interface that obfuscates its source. Examples of data virtualization include publishing data combined from disparate data sources as a virtual database, and a virtual web service; federation of data sources such as Oracle, SOAP, SAP and SalesForce.com. Data Virtualization As defined previously, data virtualization is a form of data integration technology that provisions data on demand, in near real-time 2, from disparate sources. The provisioning of data is performed by views and procedures in a virtual layer, which 1 http://en.wikipedia.org/wiki/data_integration 2 In this document, real-time means data is read when requested, not that data is delivered to a user instantaneously. Some latency is expected, ranging from microseconds and beyond depending on query complexity.

encapsulate the query, and mapping, of source data into logical entities. Mapping the results of the views into an industry standard interface facilitates the delivery of data to the requesting client. When compared with other data integration technologies, there is no movement of data into a physical persistent store with data virtualization. Instead, data remains at the source locations, in the source format, and is pulled into the virtual layer only on demand. Challenges Addressed by Data Virtualization Solution: Abstraction is used to map incompatible native structures and formats into those required. Location: Federates data stored in multiple locations and formats, and makes it appear as if the data is stored in a single virtual location. Completeness: Allows fragments of data from disparate sources to be combined into a full picture. Latency: Allows near real-time access to current data. Benefits of Data Virtualization Reduction of time, cost, and risk: In the case of new Cisco Data Virtualization installations, significantly less additional infrastructure, software, administration overhead, monitoring, operational roles, etc. is needed than using traditional methods. In the case of new projects using an existing installation, often no additional resources need to be provisioned. Real-time data access: Zero to low latency for access to existing data sources. It is not necessary to wait for a scheduled load to have up-to-the-minute data Faster deployment: Projects deploy faster because minimal, or no, additional infrastructure setup is required. In addition, resources developed for one project may be shared and reused/expanded for the next project, further reducing development cycles. Easier maintenance: Changes to physical databases, warehouses, and marts, can be costly and time-consuming. By utilizing virtual views, changes are made quickly and can be deployed immediately thereafter. Encapsulation of underlying complexity: Modeling complexity can be tailored; for example, users can be shielded from data schemas, storage technologies, and ancillary information, using virtual views that give them only what they need.

Decoupling of consumers from sources: By using a layer of abstraction, we not only shield downstream consumers from structural changes, we can also create reusable patterns for others to safely leverage. Drawbacks of Data Virtualization Transformations are repeated with every query: This is the nature of ondemand transformations. Repeated transformations may not be an issue if the load is small, or if the servers and infrastructure can be scaled to handle the load. One option is to convert the transformations to periodic instead of ondemand, by use of caching. Often, this drawback is traded for real-time access to data. Complex joins or transformations can be very slow: Queries or transformations that are very complex or operate on large data sets may take a long time, which may not be acceptable for a user-interactive application. Use of Cisco Data Virtualization's specialized join algorithms, and statistics gathering, can often overcome this challenge. Caching can also help, but then introduces data latency. The former is often sufficient with the latter implemented only as a last resort. No data history: The only historical data available is that which exists in the source systems. Because data is not physically persisted with data virtualization, historical archives are not accumulated. Cumulative, incremental caching, strategies can be used to preserve historical data (similar to warehouse trickle feeding techniques) at the cost of implementation complexity and storage management.

PART 2: WHEN TO USE DATA VIRTUALIZATION How does one decide when to use data virtualization? To make an informed decision on when to use data virtualization, you should: Clarify your business requirements or problem statement Understand the use cases for data virtualization Identify the benefits that you expect from using data virtualization Examine existing usage patterns prescribed for data virtualization Follow the recommendations given by industry experts Consider the technical constraints of the environment Once the business requirements or problem statement has been clearly defined, surveying the use cases and proven patterns for data virtualization can be effective in helping you to understand the ways that this technology can be leveraged. In addition, best practice recommendations from data warehousing, business intelligence, and data integration experts in the field will provide useful guidance for when to use data virtualization. Industry Usage Patterns The patterns that follow in the rest of this section represent the common applications of data virtualization in the industry, found in trade white papers and vendor online materials. The purpose of presenting the patterns here, in abbreviated form, is the following: To raise awareness of the existence of these proven data virtualization patterns To provide a convenient way for users to search for a pattern that fits their use case To present the patterns in a clear and simple way, free of vendor bias, marketing hype, and sales pitch Several categories of usage pattern will be explored: Business Intelligence: BI Solution Prototype, BI/Performance Management, Data Discovery, Virtual Analytic Sandbox, Big Data Access Data Warehousing: Data Warehouse Augmentation, Data Warehouse Federation, Data Warehouse Migration, Data Warehouse Prototype, Hub & Virtual Spoke, Virtual Data Mart, Virtual Data Source Master Data Management: MDM Hub Extension 360 degree View, Virtual Master Data Management

Information Services: On-Demand Information Services Operational Data Management: Virtual Operational Data Store Enterprise Architecture: Enterprise Data Virtualization, Enterprise Shared Data Services Miscellaneous: Caches, Cloud Data Integration, SaaS/AssS/PaaS, ETL Complement Business Intelligence Business Intelligence Solution Prototype [3] Use Case: Reduce risk by using data virtualization to quickly prototype a data schema for user review, prior to building the actual physical warehouse. Main Benefits: Reduces potential for missed requirements since prototype can be developed in quick iterations and the results shown to users for feedback. Business Intelligence and Performance Management [7] Use Case: Federate multiple line of business (LoB) BI systems with enterprise level performance management scorecards and dashboards to allow detailed low level LoB metrics to be used in calculating higher level enterprise key performance indicators (KPIs).

Main Benefits: Allows enterprise-level KPIs to be calculated from aggregated LoB metrics in multiple BI systems. Data Discovery [7] Use Case: Search virtual views of data held in multiple operational and analytical data sources. Main Benefits: Expands access to enterprise data for a larger set of users who may not know where the data they need is located, or who may not have the skills or time to use BI tools. Allows users to find relationships between data across systems, and use it to answer business questions. Virtual Analytic Sandbox [5] Use Case: Eliminate rogue, analyst-created, data marts by leveraging data virtualization to create consistent views of analytical, operational, and external, data. This pattern is similar to the Virtual Data Mart.

Main Benefits: Prevents unnecessary data duplication via physical data marts. Maintains data consistency between views due to data being sourced in real-time from a single version of truth. Avoids time, effort, and costs, associated with creation, ETL, and maintenance, of physical data marts. Virtual data sandboxes can be selfserviced by analysts themselves. Big Data Integration Use Case: Combine big data with traditional data for analysis.

Main Benefits: Join data across traditional data sources, and emerging technologies such as Hadoop, to unlock additional business value for reporting and analytics. Data Warehouse Data Warehouse Augmentation [3] Use Case: Federate data warehouse data with additional sources to build complementary views that yield more complete information.

Main Benefits: Presents a holistic view of business activity more rapidly than traditional consolidation technologies, reducing time to market, and satisfying business demand more quickly. Data Warehouse Federation [3], [5] Use Case: Federate multiple data warehouses into a single, logically integrated, unit for reporting or analysis.

Main Benefits: Provides a single view across reporting systems, department warehouses, etc. Data Warehouse Migration [3], [5] Use Case: Insert a virtual layer between the data warehouse and reporting systems to allow reporting to continue uninterrupted during a migration.

Main Benefits: Insulates reports and queries from source system changes. Helps minimize the impact on downstream reports while a data warehouse is being migrated or replaced. Data Warehouse Prototype [3], [5] Use Case: Leverage data virtualization for rapid prototyping during the early construction stages of a new data warehouse to significantly reduce the level of effort required to make schema changes as requirements evolve.

Main Benefits: Allows quicker feedback during early stages of development as views can be prototyped, and updated quickly, to keep up with users changing needs. Minimizes costs, risks, and time, associated with deploying many types of applications. Hub & Virtual Spoke [3] Use Case: Use data virtualization to create virtual data marts that serve different reporting/analytic requirements while preserving the quality and controls of the data warehouse.

Main Benefits: Prevents proliferation of rogue physical data marts created to get around the controls provided by the data warehouse. Virtual Data Mart [3], [7] Use Case: Eliminate or reduce the need for creating physical marts by creating virtual marts that source their data directly from the warehouse.

Main Benefits: Prevents unnecessary data duplication. Maintains data consistency between views. Saves time, effort, and costs, associated with creation, ETL, and maintenance, of physical data marts. Virtual data sandboxes can be self-serviced by analysts themselves. Virtual Data Source [3], [7] Use Case: Leverage virtual views and data services as inputs to ETL batch processes. Main Benefits: Enables ETL tools to support data sources they can t easily access, such as cloud-based applications. Shields ETL workflows from structural changes to operational data sources. Provides a strategy for logically organizing views around a particular type of data (such as product, customer, etc.) that helps to modularize ETL workflows and increase reuse potential. Master Data Management MDM Hub Extension 360 Degree View [3], [5] Use Case: Use data virtualization to extend the master hub with additional key transactional, or historical, data for more complete information.

Main Benefits: Allows MDM entity data to be enriched with transactional and/or historical information. Avoids having to physically consolidate the required information into a data warehouse. Virtual MDM [7] Use Case: Use data virtualization to federate multiple domain-specific MDM stores (e.g., customer, product, order, etc.) to provide a single, virtual, point of access for master data. Main Benefits: Provides a single integrated view of master data for consistent use across the enterprise.

Information Services On-Demand Information Services [7] Use Case: Provide on-demand, integrated, data to applications, reporting tools, processes, and portals, via web services. Main Benefits: Rapid development of reusable Information-as-a-Service assets for applications, reporting tools, processes, and portals. Operational Data Management Virtual Operational Data Store [3], [7] Use Case: Create a virtual operational data store to federate transactional data sources directly, when time, cost, and other constraints, do not justify a physical ODS. Main Benefits: Provides a single view of current operational status across related systems. Avoids extraction, and consolidation, into a separate database. Enterprise Architecture Enterprise Shared Data Services [3], [5], [7] Use Cases: Create a data abstraction layer to transform data from native sources into reusable views, and services. Federate data warehouses, and other sources, into a unified data virtualization layer to provide common enterprise data. Create shareable data services that integrate well with SOA infrastructure to promote greater agility, and reuse.

Main Benefits: Creation of reusable virtual view, and service, assets. Promotes business agility through reuse, and data consistency. Miscellaneous Caches [3] Use Cases: Used to increase query performance when query optimization is not enough. Used to ease the burden on operational data sources that do not have the capacity to support additional load from ad hoc queries during work hours. Used to ensure same report results are always returned for a specific period of time (day, week, month.) Used to populate local tables from federated queries that make use of differing protocols, such as web services, relational tables, and flat files.

Main Benefits: Potentially increase performance of queries. Reduce load on backend data sources. Provide a means for systems that prefer to access federated data via a single protocol, such as relational, or web service. Cloud Data Integration Use Cases: Seamlessly integrate data stored in the cloud with data stored in on-site systems. Main Benefits: Take advantage of the attractive economics and reduced TCO for data stored in the cloud. SaaS/AaaS/PaaS Use Cases: Used to federate data stored in applications, platforms and services such as SalesForce.com, SAP, SAP BW, and others.

Main Benefits: Packaged application data can be easily, and quickly, integrated for enterprise-wide data modeling. ETL Complement Use Cases: Data Virtualization can reduce the time to implement ETL projects. Data can be modeled in virtual views, and ETL tools can connect to CIS as a data provider. Main Benefits: Faster time to delivery of lightweight ETL solutions. Make use of cloud data, SaaS/AaaS/PaaS, web service, big data, etc. in ETL jobs. Reduces cost of ETL tools as additional connectors do not need to be purchased. Industry Recommendations Do s Do use data virtualization for prototyping data warehouse or BI projects [9]: Use a virtual data prototype to assist in gathering, refining, and documenting, business needs. This agile development approach is less time-consuming, allows changes to be made rapidly, and reduces risk of missed requirements. Do augment data warehouse with current data [13]: Combine historical data from data warehouses with data virtualization snapshots for a more complete view.

Do prefer virtual data marts to physical ones [13]: Create virtual data marts based on the data warehouse instead of moving additional data to it. Do augment ETL [13]: Data from ETL is latent, while data from virtualization is near real-time. Combine the two for a more complete view. Do monitor the impact of data virtualization on source systems [9]: Monitor performance of operational systems, and collect performance metrics over time, to predict and avert impacts of BI activities. Do determine data virtualization infrastructure requirements [9]: Take into account hardware, software, network performance, and scalability, when implementing data virtualization projects to make sure adequate infrastructure exists to support them. Do define a shared business view of source data [9]: Create shared business views to ensure a consistent version of source data, reduce the proliferation of individual views for the same data, and insulate data consumers from changes to data sources. Do introduce a Shared Business Vocabulary (SBV) to promote a consistent understanding of data [7]: Use of a SBV (i.e., common data names and data definitions, canonical model, etc.) ensures consistency across the enterprise. Do determine if a virtual mart can satisfy user s BI needs before building a physical mart [8]: A virtual mart has many advantages: It does not require movement of data, it gives real-time access to the data, it is less costly to implement, and it is easier to maintain. Do use data services as a foundation for SOA [12]: Data services allow data Don t to be abstracted from its sources, and consumed by any service-enabled consumer in a SOA. Don t use data virtualization to create a virtual data warehouse directly from transactional / operational systems [9]: Creating a virtual data warehouse requires systems to support the large storage requirements, and runtime integration, of huge amounts of data, as well as to support the workload of BI reporting, and analytics. Typical transactional / operational systems do not have the resources to support this in addition to normal daily processing tasks. Don t use data virtualization with poor-quality, highly complex data sources [9]: Poor-quality, and highly complex, data sources require extensive data cleansing, and transformations, that most likely will exceed the capability of data virtualization technology to perform on-the-fly.

Don t use data virtualization for real-time analytics [9]: Real-time analytics requires event-driven, push technology. Data virtualization is pull technology, hence not a good fit. Technical Considerations In some situations, data virtualization may not be an ideal choice due to technical limitations. Constraints can be divided into two main areas; environmental, and datadriven [5], [9]: Environment Constraints Source System Availability: Ideally, each and every source system will be consistently available (note: in the virtualization model, if a single source is unavailable the entire view may be compromised.) An environment where source systems are frequently offline is well suited to caching. Source System Capacity: Ideally, ad hoc queries can be executed on source systems with minimal impact to those systems. If source systems are already overloaded with existing workloads, caching can be used to reduce load, at the cost of data latency. Network Reliability: Ideally, the network connections between servers will be reliable. If the network is not reliable, and prone to delays or outages, virtualization may be part of, but not the entire solution. Data Constraints Data Formats: Data in relational or XML formats is ideal for virtualization Data Transformations: Ideally, data doesn t require significant transformation or cleansing. If data requires complex transformations, virtualization may be part of, but not the entire, solution. Such complexity includes: o o o o o o Matching De-duping Conflict resolution De-normalization Rollups Dimensional calculations Unstructured Data: Data virtualization works very well when integrating relational or web service data that has a known, set, schema. Unstructured data such as email, Office documents, etc. are missing this schema and cannot be integrated as easily.

Query Size: Short, tactical queries on data, queries on indexed columns, and queries with strong WHERE clauses, are ideally suited to virtualization. Queries of excessive data volumes, with de-normalized or un-indexed data, or requiring large amounts of in-memory processing at the virtualization layer, may not be well suited to virtualization. The impact of these operations can often be reduced using cost-based optimizations, Cisco Data Virtualization's special query algorithms, and caching. Consultation with users is required to determine if the performance of any given query will be acceptable. Points to Note Data virtualization is not ETL: Data virtualization technology is not suited for moving large quantities of data, or performing cleansing, and complex transformations on the fly. Data virtualization does not replace data warehousing: The typical requirements for a data warehouse cannot be supported by the operational systems in conjunction with data virtualization, such as: o o o Accumulating huge amounts of historical data Extensive processing that goes into creating the clean data Ability to support BI workloads that are long running and expensive Data virtualization is not for transactional services: Data virtualization technology is best used to provision data for read-only, or data services (more about this in the next part). Use data virtualization to complement other data integration methods: Data virtualization is most effective when used to complement other integration approaches, such as consolidation, and synchronization. For example, building new physical data warehouses is typically expensive, and time-consuming; consider leveraging virtualization to: o o o Prototype the virtual schema until requirements and model are stable Perform analysis on source data prior to physical consolidation Augment current data from ODS data sources with historical data from the data warehouse for a holistic view Always consider the technical constraints: Remember that data virtualization does not work well if the infrastructure is unreliable, systems are already overloaded, and data queries are large.

PART 3: HOW TO USE THE CISCO DATA VIRTUALIZATION PLATFORM The question How to use the Cisco Data Virtualization Platform? can be answered by looking at it from these main points of view: The architectural perspective that explores the vision and strategy for using Cisco Data Virtualization, which also looks at how the Platform is aligned with SOA efforts. The development perspective that examines the methods, rules, and principles, for developing with Cisco Data Virtualization. For the sake of brevity, it is assumed the reader is familiar with what the Cisco Information Server (CIS) is, and does. If not, see the Cisco Data Virtualization Platform Architecture whitepaper, available from http://www.compositesw.com/index.php/resources/white-papers-reports Vision and Strategy for Cisco Data Virtualization The Role of Cisco Data Virtualization Cisco Data Virtualization can be used to develop a data services layer that can be part of a larger corporate strategy for the delivery of information to services, applications, and end users. In this context, the role of the Cisco Data Virtualization Platform is defined as a tool used primarily for these purposes: Semantic transformation of data into business information Provisioning of information services Application-specific data abstraction and delivery Ad-hoc analytics and reporting Deployment of Cisco Data Virtualization Cisco Data Virtualization can be deployed at two levels: At the project level for application-focused data integrations, or At the enterprise level for common data views, and shared services Project-level deployment of data virtualization targets the delivery of a product (i.e. the application being developed.) This usually translates to a focus on integrating, and provisioning, data for the application, without special attention paid to standards, sharing, reuse, etc.. The pace of product delivery is quick Enterprise-level deployment of data virtualization, on the other hand, targets delivery of enterprise-wide reusable, trusted, data, and service, assets. This translates to leveraging architectural principles, and standards, to achieve broad applicability, interoperability, and reuse, opportunities. Furthermore, governance is an imperative.

Both types of deployments will be used. It is expected that project-level deployments will likely encounter common data challenges that will require enterprise level solutions to be developed. In addition, the common data views, and shared services, deployed at the enterprise-level will provide a foundation of reuse for future projectlevel solutions. The main characteristics for each deployment type are summarized in the following table: Characteristic Project-Level Deployment Enterprise-Level Deployment Scope of Usage Application Enterprise Scope of Access Private Public / Global Development Focus Target Benefits Governance Delivery of a product (application) Speed of deployment Cost minimizations Risk reductions Application-specific optimizations Relaxed Project-owned Delivery of reusable, trusted data, and service, assets Data abstraction Data consistency Data quality Reuse Flexibility Interoperability Strict Governance-team owned Usage of Cisco Data Virtualization The Cisco Data Virtualization Platform, in almost all cases, accesses data on a readonly basis. This document does not address use cases that involve data modifications, although these write-backs are supported, and several organizations make use of this capability. SOA Data Services Simply put, data services are an approach to implement trusted, reusable, data assets that can be consumed in a convenient and predictable way. In the context of SOA, data services refer to services that provision data by abstracting back-end systems via a business semantic layer and a common data

access interface [11]. The business semantic layer converts the native data to conform to logical business entities, and using standard SOAP / REST Web services provides a common access interface. These two abstraction approaches increase reusability, and interoperability, which in turn contribute to the end goals of business agility, and cost savings, in an SOA architecture. The main characteristics of SOAbased data services are summarized below: What Why How Use of data abstraction Use of standards Use of governance Simplifies data provisioning Increases interoperability Ensures quality of data & service Hides the complexities of physical data (e.g., incompatible structures, incompleteness, semantics, quality issues, access methods) Allows physical data schemas to be mapped into virtual data schemas that are better suited for the domain Insulates the consumers from changes in the underlying physical data Use REST or SOAP web service interface XML or JSON data interchange format Basic authentication or SAML HTTP transport with or without encryption Guarantees data quality, service reliability, performance, and security Value of Data Services SOA-based data services can provide the following values: Provide a cohesive and consistent view of data Provide access simplicity Provide access to integrated, real time information Insulate the consumers from underlying data changes

CONCLUSION This document has provided some guidelines for when and how to use the Cisco Data Virtualization Platform. The intention of this whitepaper is to provide a general understanding of what data virtualization is, and when to use it. It also seeks to provide some insights into the vision, and strategy, for using the Cisco Data Virtualization Platform. Resources DbVisualizer: Universal graphical JDBC client - Free soapui: Web Service (SOAP) client for testing - Free RESTclient: Java application to test RESTful web services (from Google) - Free XMLPad: XML editor with full XML schema support - Free Enterprise Architect: UML modeling tool - Free Trial, $135 per license Instant SQL Formatter: Online SQL formatting services Free - http://www.dpriver.com/pp/sqlformat.htm XML Formatter: Online XML formatting services Free - http://www.freeformatter.com/xml-formatter.html Information Sources Internet Sources URLs are valid as of the publication date of this document. 1. Cisco Systems Inc., Data Abstraction Best Practices. 2010; Available From: http://www.compositesw.com/index.php/resources/white-papers-reports. 2. Cisco Systems Inc., Data Virtualization Process. 2010; Available From: http://www.compositesw.com/index.php/resources/white-papers-reports. 3. Cisco Systems Inc., Data Virtualization Usage Patterns. 2010; Available From: http://www.compositesw.com/index.php/resources/white-papers-reports. 4. Cisco Systems Inc., Cisco Data Virtualization Platform Architecture. 2009; Available From: http://www.compositesw.com/index.php/resources/white-papersreports. 5. Eckerson, Wayne, Data Federation, TDWI Checklist Report. 2009; Available From: http://tdwi.org/research/2009/11/data-federation.aspx. 6. Eve, Robert, Maximizing Data Virtualization Benefits. 2009; Available From: http://www.ebizq.net/topics/bi/features/11579.html.

7. Ferguson, Mike, Maximizing Business Value from Data Virtualization, Intelligent Business Strategies. 2009. 8. Imhoff, Claudia, To V or Not To V: Business Intelligence Gets Virtual!, Intelligent Solutions, Inc. 2009. 9. Imhoff, Claudia, White, C., Ten Mistakes to Avoid When Using Data Federation Technology, TDWI Research. 2010; Available From: http://tdwi.org/research/2010/12/ten-mistakes-to-avoid-when-using-data-federationtechnology.aspx. 10. Linthicum, David S., Understanding the Critical Need for Data Services when Building a SOA. 2006; Available From: http://www.compositesw.com/index.php/resources/white-papers-reports. 11. Linthicum, David S., Defining, Designing, and Implementing SOA-Based Data Services. 2009; Available From: http://tdwi.org/whitepapers/2010/01/davidlinthicum-whitepaper-defining-designing-and-implementing-soa-based-dataservice/asset.aspx. 12. Linthicum, David S., Importance of Data Abstraction, Data Virtualization, and Data Services. 2010; Available From: http://www.ebizq.net/views/download_raw?metadata_id=13091&what=feature. 13. Russom, Philip, Data Integration for Real-Time Data Warehousing and Data Virtualization, TDWI Checklist Report. 2010. Available From: http://tdwi.org/research/2010/10/data-integration-for-real-time-data-warehousingand-data-virtualization.aspx. Books 14. Green, Jim, Bresemer, D., Butterworth, P., Clement, L., Ramachandra, H., Schneider, J., Vandervoort, H., An Implementor s Guide to Service Oriented Architecture: Getting It Right, Westminster Promotions. 2008. Analytical Reports 15. Manes, Anne T., Data Services Platforms: Searching for Their Place in the Market, Burton Group Application Platform Strategies In-Depth Research Report. 2008. 16. van der Lans, Rick, Developing a Data Delivery Platform With Composite Information Server. 2010. Available From: http://purl.manticoretechnology.com/imghost/582/12917/2011/resources/white_pape rs/whitepaper_developing_ddp_with_composite.pdf.