Data Warehouse Archiving:

Similar documents
Application Information Lifecycle Management Control Both the Size of Your Data and the Cost of Managing It

Test Data Management for Security and Compliance

Symantec Enterprise Vault

Moving From Reactive to Proactive Storage Management with an On-demand Cloud Solution

Data safety for digital business. Veritas Backup Exec WHITE PAPER. One solution for hybrid, physical, and virtual environments.

Secure Sensitive Data in Virtual Test Environments

Informatica Enterprise Information Catalog

Archiving, Backup, and Recovery for Complete the Promise of Virtualisation Unified information management for enterprise Windows environments

Optimizing and Managing File Storage in Windows Environments

Automatic Data Optimization with Oracle Database 12c O R A C L E W H I T E P A P E R S E P T E M B E R

Automating Information Lifecycle Management with

IBM System Storage Data Protection and Security Chen Chee Khye ATS Storage

Information Lifecycle Management for Business Data. An Oracle White Paper September 2005

2 The IBM Data Governance Unified Process

Brochure. Data Masking. Cost-Effectively Protect Data Privacy in Production and Nonproduction Systems

Top Trends in DBMS & DW

Active Archive and the State of the Industry

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery

Information Lifecycle Management and Database Archiving

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

12 Minute Guide to Archival Search

Realizing the Value of Standardized and Automated Database Management SOLUTION WHITE PAPER

Informatica Data Quality Product Family

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8

SYMANTEC: SECURITY ADVISORY SERVICES. Symantec Security Advisory Services The World Leader in Information Security

Symantec Document Retention and Discovery

An Oracle White Paper February Optimizing Storage for Oracle PeopleSoft Applications

Archive 7.0 for File Systems and NAS

Oracle Advanced Compression. An Oracle White Paper June 2007

Eight Tips for Better Archives. Eight Ways Cloudian Object Storage Benefits Archiving with Veritas Enterprise Vault

Optim. Optim Solutions for Data Governance. R. Kudžma Information management technical sales

CA Test Data Manager Key Scenarios

INTRODUCING VERITAS BACKUP EXEC SUITE

Start Now with Information Governance

CA ERwin Data Profiler

Database Growth: Problems & Solutions

Upgrade Strategies for Oracle E-Business: Leveraging Archiving Best Practices

Symantec Security Monitoring Services

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

QLogic 2500 Series FC HBAs Accelerate Application Performance

Introduction to Federation Server

Archiving Best Practices for Oracle Applications. Joshua Alpern, VP Sales Engineering

Veritas Storage Foundation for Oracle RAC from Symantec

Informatica Dynamic Data Masking

RED HAT ENTERPRISE LINUX. STANDARDIZE & SAVE.

White paper Selecting the right method

HP Storage Software Solutions

HP StorageWorks LTO-5 Ultrium tape portfolio

First Financial Bank. Highly available, centralized, tiered storage brings simplicity, reliability, and significant cost advantages to operations

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Focus On: Oracle Database 11g Release 2

An Oracle White Paper October Advanced Compression with Oracle Database 11g

August Oracle - GoldenGate Statement of Direction

CA ARCserve Backup. Benefits. Overview. The CA Advantage

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

EMC DATA DOMAIN PRODUCT OvERvIEW

Lunch and Learn: How CA Technologies and Microsoft Help Drive Down Costs of z Systems Storage

WHITE PAPER. The General Data Protection Regulation: What Title It Means and How SAS Data Management Can Help

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

TOP REASONS TO CHOOSE DELL EMC OVER VEEAM

Symantec NetBackup 7 for VMware

WHITE PAPER. Controlling Storage Costs with Oracle Database 11g. By Brian Babineau With Bill Lundell. February, 2008

Virtuozzo Containers

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

Evaluating Hyperconverged Full Stack Solutions by, David Floyer

BUSINESS VALUE SPOTLIGHT

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Implementing Information Lifecycle Management (ILM) with Oracle Database 18c O R A C L E W H I T E P A P E R F E B R U A R Y

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Data Protection for Virtualized Environments

Protect Your Data At Every Point Possible. Reduce risk while controlling costs with Dell EMC and Intel #1 in data protection 1

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

UNIFIED MANAGEMENT OF CONVERGED VOICE, DATA, AND VIDEO TECHNOLOGIES WITH AUTOMATED SUBSCRIBER AND SERVICE PROVISIONING

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Understanding Virtual System Data Protection

Discover Best of Show März 2016, Düsseldorf

Oracle Buys Automated Applications Controls Leader LogicalApps

Reducing Costs in the Data Center Comparing Costs and Benefits of Leading Data Protection Technologies

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

VMware vsphere 4. The Best Platform for Building Cloud Infrastructures

Information Infrastructure Forum

FIS Global Partners with Asigra To Provide Financial Services Clients with Enhanced Secure Data Protection that Meets Compliance Mandates

Managing Oracle Database 12c with Oracle Enterprise Manager 12c

Data Archiving Using Enhanced MAID

Simplify Backups. Dell PowerVault DL2000 Family

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

IBM Storage Software Strategy

New Zealand Government IBM Infrastructure as a Service

The Hidden Costs of Free Database Auditing Comparing the total cost of ownership of native database auditing vs. Imperva SecureSphere

IBM Z servers running Oracle Database 12c on Linux

CICS insights from IT professionals revealed

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Mapping Your Requirements to the NIST Cybersecurity Framework. Industry Perspective

Full file at

Selecting the Right Method

Veritas Scalable File Server (SFS) Solution Brief for Energy (Oil and Gas) Industry

Transform your network and your customer experience. Introducing SD-WAN Concierge

Transcription:

Data Warehouse Archiving: A Way to Optimize Data Warehouse Performance and Reduce Costs WHITE PAPER

This document contains Confidential, Proprietary, and Trade Secret Information ( Confidential Information ) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica. While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice. The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software product as well as the timing of any such release or upgrade is at the sole discretion of Informatica. Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700. This edition published January 2010.

White Paper Table of Contents Executive Summary... 2 Drivers for Managing Data Growth in Data Warehouses... 3 Conventional Solutions and Their Limitations... 5 Upgrading Hardware... 5 Database Tuning and Partitioning.... 5 Hand Coding.... 5 Purging Data.... 5 The Benefits of Data Warehouse Archiving... 6 Key Requirements of a Data Warehouse Archiving Solution... 8 Data Growth Assessment Capabilities... 8 Metadata Discovery... 10 Simple Metadata Extensibility... 10 Robust Archiving Techniques to Enable Optimal Storage Tiers... 11 Easy, Multiple Methods to Access Archived Data.... 12 Universal Connectivity... 12 Integration with Other Archiving Platform, Enterprise Content Management, and Storage Solutions.... 13 Informatica Data Archive: The Complete Data Warehouse Archiving Solution.... 13 Conclusion.... 14 Data Warehouse Archiving: A Way to Optimize Data Warehouse Performance and Reduce Costs 1

Executive Summary Data warehouses are critical systems that link data from multiple source applications, aggregating the data, and delivering it to analytical decision support systems, which are central to many organizations financial analysis and decision making processes. Given that data warehouses integrate data from multiple systems and the cumulative nature of the application, which at the same time requires drilling down to detailed data, data warehouses tend to have huge data volumes, measurable in terabytes. And the size of data warehouses will continue to grow at a staggering rate. The increase in data warehouse volumes stems from a number of factors, including: The need to piece together data from more disparate applications for a complete view of the customer or other business entity Increased database complexity as additional information about each transaction is captured The need to integrate data more often in real time Expanding transaction volumes from organic business growth The need to retain data for longer periods to comply with regulations further increasing data volumes and management costs To address these increasingly complex issues, IT organizations need a cost-effective, long-term solution for managing data growth in data warehouses along with the performance degradation and maintenance costs associated with such growth. The answer to this problem is data warehouse archiving. This white paper examines how data warehouse archiving can help your IT organization better manage the growing data volume in your data warehouses and reduce the associated storage costs by using tiers. After reading this paper, you ll have a better understanding of: The drivers for managing data growth in data warehouses How conventional methods of managing data growth fall short The benefits of data warehouse archiving The key requirements for a data warehouse archiving solution 2

White Paper Drivers for Managing Data Growth in Data Warehouses As Figure 1 shows, data volumes aren t just growing they re exploding, Forrester Research estimates the volume of data housed in large business applications, including data warehouses, grows by as much as 65 percent each year. 1 Most of this growth is due to an accumulation of inactive data. IDC estimates 85 percent of production data is inactive. Exabytes 25.0 20.0 15.0 10.0 5.0 0.0 2005 2006 2007 2008 2009 2010 2011 Figure 1. Data repositories for large business systems, including data warehouses are growing by more than 65 percent annually. 1 Forrester Research, Securing Next-Generation Information Architectures, The Promise of Improved Security or the Risk of New Attack Vectors, October 24, 2008. Data Warehouse Archiving: A Way to Optimize Data Warehouse Performance and Reduce Costs 3

The growth in your data warehouse volumes can be attributed to several factors: Business growth. As your business grows, more transaction volumes are added to your applications. When your company merges with or acquires another, or expands its operations globally, the result is more data. Demand for real-time data. Today, users don t want old or stale data. They need current and upto-the-minute information, so that they can make better business decisions. As a result, data warehouses are updated more frequently to integrate the latest information. Holistic view driving decision making. More and more information from disparate sources is integrated into data warehouses to supply a more holistic view of the customer or other business entities. As a result, data warehouses are becoming massive databases, providing the complete information necessary to enable faster and more accurate decision making. Longer data retention for compliance. Organizations are retaining data for longer and longer periods due to regulatory compliance. Some regulations are requiring organizations to retain data for as long as ten years. Increasing data size brings a number of issues in managing data warehouses: Higher infrastructure and maintenance costs. More data also means additional hardware, software, and maintenance costs. Although storage cost continues to fall, production data warehouses are usually housed in high-end primary storage, which still commands a significant portion of the IT budget. With more data to process, you also require more CPUs, which lead to additional database license costs. More time and effort spent in performing administrative tasks such as backups and upgrades means not only longer system unavailability but also time diverted from more critical or strategic IT projects, growing IT staff overtime cost, or even additional full-time equivalent (FTE) cost. Reduced system availability. As data volumes grow, it takes more time and effort for your end users and database administrators to perform essential tasks on production data warehouses. Data warehouse loads take longer to complete. Database backups are slower and can t be done overnight. Upgrading database versions or applying software patches becomes more complicated and can t be completed over a weekend. Maintaining application service level agreements (SLAs), while keeping cost down becomes virtually impossible. Data warehouse performance declines as more data accumulates, as Figure 2 illustrates. Reports take longer to run and overall end-user response time is slower. Performance Database Size Inactive Data Time Figure 2. Data warehouse performance declines as the volume of data grows over time. Active Data 4 These challenges are prompting IT organizations to look for more effective solutions to manage the growing data in their data warehouses.

White Paper Conventional Solutions and Their Limitations If your IT organization is like most, you ve used a variety of methods to manage data growth in your data warehouses. For example: You may have purchased additional storage and processing hardware. You may have tuned and partitioned the database. You may have developed in-house scripts to purge or archive data. But these conventional approaches often fail to deliver a long-term solution to your data warehouse management challenges. Let s explore the limitations of these typical solutions. Upgrading Hardware Throwing more hardware at the problem may seem like the simplest answer, but it is not a viable long-term solution even with the downward trend of disk and processor costs and the availability of powerful data warehouse appliances. With larger and larger data volumes, input/output or network bandwidth becomes the bottleneck eventually. And more hardware just increases architectural complexity while offering limited scalability improvements. Large, powerful data warehouse appliances can also become expensive as data continues to grow and more and more processing power is required. Database Tuning and Partitioning Data warehouse administrators commonly turn to tuning and partitioning to manage data growth within the database and improve application performance. But DBAs quickly discover that while tuning is effective the first time, successive tunings offer diminishing returns and are more time intensive. While partitioning offers some relief to improve database performance, it doesn t reduce the required storage capacity and is limited in its potential to lower overall infrastructure cost, including database license, server, and storage costs. Hand Coding In-house code or scripts to purge or archive data in data warehouses are expensive to develop and maintain because they require deep knowledge of business entities, table schemas, relationships, and business rules. Because constraints and relationships between data warehouse objects are not always completely maintained within the data warehouse metadata, in-house scripts tend to apply business rules for archiving or purging inconsistently across records, tables, entities, and databases. Purging Data Just purging data in data warehouses is not a safe alternative due to compliance reasons. Although a good percentage of the data in data warehouses is derived from other sources and can be reproduced from integrating these sources, many data warehouses include additional operational or transactional data, which is not stored in other data sources. Data warehouses also tend to evolve to become applications of their own right, which need to be backed up and archived to ensure availability, quick recovery, and e-discovery for compliance audit purposes. Data Warehouse Archiving: A Way to Optimize Data Warehouse Performance and Reduce Costs 5

The Benefits of Data Warehouse Archiving The key to managing exploding data volumes in data warehouses lies in two facts: the value of all data diminishes over time, and all data is not created equal. Let s examine the time issue first. Your business users may need access to detailed revenue information from the last year for financial reporting purposes. Once the fiscal year ends, financial information from the previous year or three years ago is not accessed as regularly. As a result, this historical data is largely inactive used infrequently for aggregate reporting and compliance purposes. The second consideration is the fact that all data is not equally important. In data warehouses, while historical aggregate information is needed for longer periods (e.g., annual revenue information may be required for reporting performance during the past three to seven years), transactional data or more granular aggregates (e.g., quarterly revenue information) are rarely needed beyond a year. IT organizations need a way to cost-effectively, efficiently, and securely manage different classifications of production data in data warehouses based on their value to the business throughout the data lifecycle. According to Gartner, one of the best practices for managing a scalable data warehouse is that its architecture must account for its storage and access, as well as its archive and retirement. 2 This statement reinforces the need for managing data growth and the lifecycle of the data in data warehouses by data archiving and retirement. Data warehouse archiving enables IT organizations to purge or relocate less-valued or lessfrequently accessed data from production data warehouses to second- or third-line storage to reduce costs, increase system availability, and improve performance all while satisfying data retention, access, and security requirements. Figure 3 shows an example of a tiered storage strategy for data warehouses. ARCHIVE ARCHIVE ERP/SCM Databases PROD DW DW Archive Compressed File-Based Archive Flat Files Production Applications in 1st Tier Storage (e.g., SAN) CRM Application Production Data Warehouse on 1st or 2nd Tier Storage (e.g., SAN, NAS, DW App) RESTORE Data Warehouse Archive on 2nd or 3rd Tier Storage (e.g., SATA, NAS) RESTORE Compressed File-Based Archive on 2nd or 3rd Tier Storage (e.g., NAS, CAS, Cloud) Figure 3. Example of a tiered storage strategy for data warehouses. 6 2 Beyer, Mark A., Data Warehouse Architecture Best Practices and Guiding Principles, Gartner Research, November 6, 2009.

White Paper Data warehouse archiving helps IT organizations to: Cost-effectively manage data growth by relocating inactive data to less expensive infrastructure and enabling storage tiering Improve data warehouse performance by removing inactive data to reduce the size of data that needs to be processed within production data warehouses Support regulatory compliance by cost-effectively retaining data for a longer period Data Warehouse Archiving Solution What Should Your IT Organization Look For? Data growth assessment capabilities. Can the solution assess and target the largest and fastest growing tables, table spaces, and schemas? Metadata discovery. Does the solution provide automatic discovery of metadata about tables, columns, and relationships? Simple metadata extensibility. Does the solution offer simple graphical user interfaces to allow you to extend and customize the discovered metadata? Robust archiving techniques to enable optimal storage tiers. Does the solution provide multiple archiving formats and destination options? Does it allow archiving the highest growth tables while maintaining data integrity? Does it allow restoration of the data to support varying storage and access requirements? Multiple, easy access options to archived data. Can you access the archived data easily, either from the same application interface or from an applicationindependent interface, using standard protocols? Universal connectivity. Can the solution archive data from any source system? Integration with other archiving platforms, enterprise content management (ECM) storage. Does the solution support integration with other archiving platforms, ECM systems, and archival storage to support central storage management and discovery of archived data? Data Warehouse Archiving: A Way to Optimize Data Warehouse Performance and Reduce Costs 7

Key Requirements of a Data Warehouse Archiving Solution If your IT organization is evaluating a data warehouse archiving solution, the following are key requirements you should consider: Data growth assessment capabilities Metadata discovery Simple metadata extensibility Robust archiving techniques to enable optimal storage tiers Easy, multiple methods to access archived data Universal connectivity Integration with other archiving platform, enterprise content management, and storage solutions Let s examine these factors in greater detail. Data Growth Assessment Capabilities Your IT organization first needs to evaluate which tables and table spaces are growing most rapidly. A data warehouse archiving solution should enable you to assess data growth not just once, but on an ongoing basis to continually adjust archiving strategies and maximize the ROI of your solution. Once the top-growing fact or detail tables and table spaces are identified, your IT organization can then define the appropriate archiving strategies. In-depth data growth analysis allows you to evaluate current and future data growth rates across tables, table spaces, and schemas within your data warehouses. Figure 4 shows an example of a data growth analysis that helps your IT organization to understand which tables and table spaces occupy the most space. This type of analysis also helps your team proactively plan for growth in data volumes by forecasting the estimated reduction in size from archiving inactive data (see Figure 5). 8

White Paper Tables belonging to the Activity, Backup, and Contract Mgmt modules* comprise 82% of all data (920 of 1,121 GB): Estimated Actual Estimated -3 years -2 years -1 year Current +1 year +2 years +3 years Datafile (GB) - 5,886 10,589 16,062 28,121 40,447 53,877 Data - 411 739 1,121 1,963 2,823 3,761 Largest modules* Activity - 181.6 448.7 809.0 1,148.3 1,555.1 2,029.2 Backup - 60.0 71.4 82.7 95.2 108.9 124.0 Contract Mgmt - 6.4 14.8 27.9 346.8 665.6 984.6 Samples - 23.4 24.7 25.9 36.9 48.0 59.3 Segmentation - - - 4.2 54.5 104.9 155.2 Figure 4. From the data growth analysis, your IT organization has an inventory of the top-growing tables and schemas in your data warehouse. Figure 5. Data growth analysis enables your IT organization to understand the impact of data archiving strategies on data growth in data warehouses. Data Warehouse Archiving: A Way to Optimize Data Warehouse Performance and Reduce Costs 9

Metadata Discovery Each data warehouse has its own schema design, with varying relationships and constraints between dimension and fact tables and among aggregate, fact, and dimension tables. The data warehouse archiving solution should provide an automatic means of mining the database for metadata about the entity schema and relationships. The archiving solution needs to be aware of the relationships between records across tables and schemas to ensure that all related records are relocated together or links are maintained between them. This way, data integrity is maintained when data is archived and restored. Without an automated means of discovering this metadata, you would need to define it manually, requiring significant configuration time before being able to deploy the solution. Simple Metadata Extensibility Not all metadata can be discovered by mining the database. A data warehouse archiving solution should provide a simple graphical interface to allow users to extend and customize the discovered metadata. Groupings of tables into business entities and definitions of business rules specifying the eligibility criteria for records to be archived are metadata that may not be discovered or inferred accurately. Therefore, some user-provided guidance may be required. With a simple graphical user interface as shown in Figure 6, you can easily view, edit, and extend your data warehouse entity model metadata and business rules. By mining the database and using a wizard-based interface, you can quickly discover metadata in the data warehouse and add new attributes to augment structural metadata with rich context. Figure 6. A simple graphical user interface lets you easily view, edit, and extend discovered metadata from your data warehouse. 10

White Paper Robust Archiving Techniques to Enable Optimal Storage Tiers The major drivers for data warehouse archiving are usually to reduce infrastructure cost by creating storage tiers, reduce maintenance cost, and maintain peak data warehouse performance. Simply relocating inactive data from the production data warehouses to lower-cost servers and storage achieves those goals, but your business requirements are likely to be more complex. You need to consider your organization s budget constraints and performance and access requirements when selecting a data warehouse archiving solution. Your IT organization will probably access archived data less frequently than active data. But you may still have to periodically retrieve the combined archived and current data directly from the original application interface. In this case, the data should be archived to a format that facilitates relatively high query performance such as another data warehouse instance, located on a lowercost infrastructure. On the other hand, if inactive data is old and ready to be retired, you may have to access it only rarely. In this case, access from a reporting tool, rather than from an application interface, may be adequate. Slower query performance can be tolerated, and the data may be archived to a more optimal, compressed format, such as a compressed file. Archiving to a compressed file format can result in very high storage capacity saving. Depending on the data size and the level of redundancy in data values, you may be able to achieve a compression ratio ranging from 20:1 to 60:1 compared to the original data size. Based on the age of the data and response time as well as frequency of access, the compressed archive file can be stored on a file system located in lower-cost storage or even storage in the cloud, for economies of scale. As data ages and access requirements change over time, your IT organization needs a way to convert and relocate the data from one archiving format and location to another, enabling multiple cost-effective storage tiers. A data warehouse archiving solution also needs to enable archiving transactional and detailed data only, which are the fastest growing. This needs to be done while maintaining data integrity and links to dimensional and aggregate tables that may still be stored in the production system. Eventually, some older dimension records may also be archived as well. The data warehouse archiving solution should know what types of tables need to be archived to support an optimal archiving strategy. At the same time, the user should be able to define an archiving job easily without extensive configuration or programming. Figure 7 illustrates a data warehouse archiving strategy where detailed data are slowly relocated to another database and subsequently to a more optimal compressed file format, which results in extreme reduction in storage capacity. Figure 8 shows a wizard-based interface to allow users to easily define and monitor archiving jobs. Production Data Warehouse (less than 2 years old) Archive Data Warehouse (2 7 years old) Optimized File Archive (40:1 compression) (over 7 years old) DIM1 DIM2 DIM3 DETAIL1 DETAIL2 OLD_DIM3 DETAIL 3 DETAIL 4 OLD_DIM2 AGG1 AGG2 AGG2 DETAIL 5 DETAIL 6 DETAIL 7 Figure 7. A data warehouse archiving solution should offer multiple archiving formats (database or compressed file) that enable optimal storage tiering and the flexibility to archive different types of records while maintaining data integrity Data Warehouse Archiving: A Way to Optimize Data Warehouse Performance and Reduce Costs 11

A data warehouse archiving solution that offers multiple archiving formats and accessibility options allows IT organizations to determine the appropriate trade-offs among archive size, performance, application accessibility, and cost. Figure 8. Archive complete business entities using Informatica Data Archive. Your IT organization must also be able to restore archive data to its original location. Otherwise, there is no way to correct mistakes during archiving or to accommodate changes to access requirements. If archived data later needs to become active again and for some reason modified and annotated, then it also needs to be restored. For example, a customer order that is closed and reopened may need to be restored because it has become active again. The data warehouse archiving solution must be able to restore archived data at different levels of granularity, such as selected detail records, business entities, or an entire archive. Easy, Multiple Methods to Access Archived Data Regardless of the archive format, archived data needs to be easily accessible either from the original application interface or through standard interfaces for reporting. Standard SQL/ODBC/ JDBC interfaces should be available for reporting using any reporting or business intelligence tool. The option of accessing the data from an e-discovery interface should be available if the data is to be retired and accessed only for compliance audit purposes. Universal Connectivity If your organization is like many other enterprises, you have data warehouses and applications on multiple database systems on varying operating systems. To support your enterprise needs, your archiving solution should allow you to manage archive processes across data warehouses and applications on diverse databases, including relational (e.g., Oracle, DB2, Sybase, SQL Server, Teradata, Informix), mainframe (e.g., IDMS, VSAM, IMS), files, and packaged CRM and ERP applications on open systems (e.g., Windows, Linux, UNIX) or mainframes (e.g., z/os, AS/400). 12

White Paper Integration with Other Archiving Platform, Enterprise Content Management, and Storage Solutions Your company may already have an archiving solution for emails and files. Your IT organization may also have standardized on an enterprise content management ECM solution to manage your unstructured data. To support compliance to regulatory requirements and ensure immutability and single-instance storage of retained data, you may be using archiving platforms, such as content addressable storage, which requires proprietary connectivity To enable your organization to respond quickly and accurately to audit requests as well as to cost-effectively retain data for longer periods, your archiving solution should allow you to manage and discover archived data of all types, both structured and unstructured, centrally. You can do so if your data warehouse archiving solution integrates with your existing archiving, content management, and storage solutions to facilitate centralized management and e-discovery of all types of archived data. Informatica Data Archive: The Complete Data Warehouse Archiving Solution Informatica Data Archive helps your IT organization to cost-effectively manage the explosion of data volumes in data warehouses. It allows IT to easily and safely archive inactive data and then readily access it when needed. Informatica Data Archive delivers the full range of capabilities that your IT organization needs to effectively manage data growth in data warehouses, including: Robust data growth assessment capabilities Complete metadata discovery Simple metadata extensibility Robust archiving techniques that ensure data integrity after archiving and supporting multiple archive formats to enable optimal storage tiers Multiple, easy methods to access archived data Universal connectivity Integration with other archiving platforms, ECM, and storage solutions, such as Symantec, Commvault, and EMC Informatica Data Archive leverages the power of the Informatica Platform, the industry s leading data integration platform, to handle the huge data volumes typical of very large global enterprises. The software provides superior scalability and performance, delivering data to the most costeffective storage option based on its value. It also offers unparalleled interoperability. The software is based on an open, easily extensible architecture, enabling simple integration with third-party solutions. Data Warehouse Archiving: A Way to Optimize Data Warehouse Performance and Reduce Costs 13

Conclusion Your IT organizations can no longer ignore the escalating costs associated with managing your growing data volumes in your data warehouses. Traditional methods of managing data growth address only the symptoms not the root cause of the problem. The key to capping your IT organization s data management costs and risks is to relocate dormant data to lower-cost infrastructure. This is what data warehouse archiving solutions can do for you. Informatica Data Archive delivers the full range of capabilities that your IT organization needs to effectively manage data growth in data warehouses. When your IT organization implements Informatica s complete, scalable, and flexible archiving solution, you ll lower the total cost of ownership of your data warehouses and other applications by: Reducing storage, server, software, and maintenance costs Improving data warehouse performance Increasing data warehouse availability Supporting compliance with internal, industry, and governmental mandates and regulations Together, Informatica and your IT organization can align the business value of data with the most appropriate and cost-effective IT infrastructure to manage it. Learn More Learn more about Informatica Data Archive and the entire Informatica Platform. Please visit us at www.informatica.com or call 1.800.653.3871. About Informatica Informatica Corporation (NASDAQ: INFA) is the world s number one independent leader in data integration software. The Informatica Platform provides organizations with a comprehensive, unified, open, and economical approach to lower IT costs and gain competitive advantage from their information assets. More than 3,700 enterprises worldwide rely on Informatica to access, integrate, and trust their information assets held in the traditional enterprise and in the Internet cloud. 14

White Paper Data Warehouse Archiving: A Way to Optimize Data Warehouse Performance and Reduce Costs 15

Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA phone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 www.informatica.com 2010 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, and The Data Integration Company are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. First Published: 2010 7082 (01/06/2010)