The Evolution of Integration by W. H. Inmon

Similar documents
Paper. Delivering Strong Security in a Hyperconverged Data Center Environment

SOFTWARE ARCHITECTURE & DESIGN INTRODUCTION

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon

Data Architecture Whitepaper MODERN DATA ARCHITECTURE BY W H INMON

Challenges of Analyzing Parametric CFD Results. White Paper Published: January

BACKUP TO THE FUTURE A SPICEWORKS SURVEY

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization

Efficiency Gains in Inbound Data Warehouse Feed Implementation

The Business Case for a Web Content Management System. Published: July 2001

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

Symantec Data Center Transformation

Why Converged Infrastructure?

Evolution For Enterprises In A Cloud World

APIS IN THE MAKING. Fast forward 18 years, we are seeing businesses use APIs as functionality in their applications. THE STATE OF APIS IN 2018

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

If you knew then...what you know now. The Why, What and Who of scale-out storage

Vehicle Infrastructure Integration: An Excellent Start

Staying Connected with Mobile Devices in the Power Sector

Whitepaper. Solving Complex Hierarchical Data Integration Issues. What is Complex Data? Types of Data

Why Converged Infrastructure?

How Did LANs Evolve to Multilayer Switching?

What to Look for in a Partner for Software-Defined Data Center (SDDC)

HyTrust government cloud adoption survey

Getting a Quick Start with RUP

Your Data Demands More NETAPP ENABLES YOU TO LEVERAGE YOUR DATA & COMPUTE FROM ANYWHERE

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS

New Approach to Unstructured Data

Making your information system simple again

The Economic Benefits of a Cooperative Control Wireless LAN Architecture

developer.* The Independent Magazine for Software Professionals

Promoting Component Architectures in a Dysfunctional Organization

2. BOM integration? Variable BOMs? No-pop? How is all that handled in ODB++?

5G Readiness Survey 2017

Red Hat's vision on Telco/NFV/IOT

Foundations. The Golden Record is Not Enough: The Case For Data Orchestration. CPDAs Highly Recommend

SDN meets the real world part two: SDN rewrites the WAN manual

When, Where & Why to Use NoSQL?

BYOD WORK THE NUTS AND BOLTS OF MAKING. Brent Gatewood, CRM

Crash Course in Modernization. A whitepaper from mrc

Cisco Start. IT solutions designed to propel your business

Version 11

Benefits of SD-WAN to the Distributed Enterprise

The Hidden Costs of Free Database Auditing Comparing the total cost of ownership of native database auditing vs. Imperva SecureSphere

MICRO DIGITAL: TECHNICAL CRITERIA FOR MAKING THE RTOS CHOICE

Preparing your network for the next wave of innovation

Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data

A Legislative Bill Text Retrieval and Distribution System Using SAS, PROC SQL, and SAS/Access to DB2

CYBER SECURITY FOR BUSINESS COUNTING THE COSTS, FINDING THE VALUE

Apple Device Management

The Next Evolution of Enterprise Public Cloud. Bring the Oracle Cloud to Your Data Center

2016 Survey MANAGING APPLE DEVICES IN HIGHER EDUCATION

Intrusion Prevention System Performance Metrics

Transform your video services with a cloud platform: Succeed in a fragmented marketplace

CPU DB Data Visualization Senior Project Report

How a Federated Identity Service Turns Identity into a Business Enabler, Not an IT Bottleneck

2013 North American Software Defined Data Center Management Platforms New Product Innovation Award

Cloud Computing: Making the Right Choice for Your Organization

Eight Tips for Better Archives. Eight Ways Cloudian Object Storage Benefits Archiving with Veritas Enterprise Vault

MULTINATIONALIZATION FOR GLOBAL LIMS DEPLOYMENT LABVANTAGE Solutions, Inc. All Rights Reserved.

Fast Innovation requires Fast IT

WEB-APIs DRIVING DIGITAL INNOVATION

WYSIWON T The XML Authoring Myths

The Seven Steps to Implement DataOps

Full file at

SDN-Based Open Networking Building Momentum Among IT Decision Makers

Barracuda Advanced Threat Protection. Bringing a New Layer of Security for . White Paper

Q&A TAKING ENTERPRISE SECURITY TO THE NEXT LEVEL. An interview with John Summers, Enterprise VP and GM, Akamai

Overcoming the Challenges of Reusing Software

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Powering Change in the Data Center. A White Paper from the Experts in Business-Critical Continuity TM

Never Drop a Call With TecInfo SIP Proxy White Paper

Converged Infrastructure Benefits and Selection Criteria. for NetApp and Fujitsu. A White Paper by Information Services Group Germany GmbH

Andy Mendelsohn, Oracle Corporation

MIND THE GOOGLE! Understanding the impact of the. Google Knowledge Graph. on your shopping center website.

MODERNIZE INFRASTRUCTURE

THE RTOS AS THE ENGINE POWERING THE INTERNET OF THINGS

Optimizing Web and Application Infrastructure on a Limited IT Budget

Continuous Processing versus Oracle RAC: An Analyst s Review

Certeon s acelera Virtual Appliance for Acceleration

6.001 Notes: Section 4.1

Professional Services for Cloud Management Solutions

WHY NETWORK FAILOVER PROTECTION IS A BUSINESS NECESSITY

NTAF IN PRACTICE. White Paper

WHITE PAPER. Comparison Guide: Choosing Between Help Authoring Tools and CCMSs

ENTERPRISE ENDPOINT PROTECTION BUYER S GUIDE

Introduction to K2View Fabric

The Hadoop Paradigm & the Need for Dataset Management

E-Sales. Meeting Solutions

The Future of Business Depends on Software Defined Storage (SDS) How SSDs can fit into and accelerate an SDS strategy

THE FUTURE OF BUSINESS DEPENDS ON SOFTWARE DEFINED STORAGE (SDS)

Implementing ITIL v3 Service Lifecycle

Content. 1. Why read this white paper? Why abandoned shopping carts? reactivation of abandoned shopping carts...

turning data into dollars

Taking Back Control of Your Network With SD-LAN

Simplified. Software-Defined Storage INSIDE SSS

How Interconnectivity is Enabling the Future of Patient-Driven Health A Whitepaper Presented by MobileHelp and KORE

Transcription:

Inmon Consulting Services by W. H. Inmon WHITE PAPER

Table of Contents Table of Contents... 2 The Dilemma of Changing Requirements... 3 The Advent of Multiple Databases across the Organization Fuels the Need for Data Integration... 4 Data Movement Technology... 6 Evolving Data Integration Needs... 8 The Cost Factor... 10 Open Source Changes the Economics of Data Integration... 10 Conclusion... 11 Page 2 of 12

Once upon a time life and information systems were simple. There were small information systems that produced their output in simple and plain language. These early systems were standalone systems that served a limited set of purposes. The requirements were well defined and straightforward. Then one day somebody let Pandora out of her box. Someone said can t we add new requirements to these systems? Can t we have something more sophisticated and elegant than this small system? The answer was of course we can. So new requirements and new fields of data were added, and a whole new class of information systems appeared. One of the immediate consequences of the advent of new requirements was the need for scalability. What once was small now grew large. And with the growth came a whole new set of problems. The advent of new requirements generated its own requirement the need for scalability The need for scalability appeared in many different forms in the need for scalability of management of volumes of data, and in the need for management of volumes of transactions. And there was another need for scalability the need for scalability of data integration. Scalability of integration includes the need for processing from many different sources and many different targets. Scalable integration includes the need for handling data that is structured in many different ways. Scalable integration includes the need for handling different kinds of logic, and so forth. The Dilemma of Changing Requirements At first, all was well. The addition of new requirements seemed like a simple thing to do. But soon, someone noticed that there was a sort of competition to get processing done. With the addition of the new re- Page 3 of 12

quirements, all sorts of people want to do all sorts of things with the system and its data. One person wants to do online processing. Online processing precludes many users getting at the data during online processing hours. Another person wants to continually redefine the data. The subsequent reorganizations of data effectively make the data disabled for long periods of time. Another person wants to do statistical processing. The problem is that once statistical processing begins to run, no one else can use the system. Fig 1 shows that as the volumes of data grow and, as the functions that shaped the system evolve, that more and more demands are being placed on the system. It is as if the system (which was once simple) is being pulled in different directions by powerful forces. Unfortunately, the system cannot accommodate this disparate set of forces. The early notion of a single data ba se serving all purposes simply did not work Fig 1 The Advent of Multiple Databases across the Organization Fuels the Need for Data Integration One result of this terrific pull by opposing forces is the advent of multiple data bases. The pull of the opposing forces is so strong that it is easier to create separate stores of data than it is to try to make the single source of data serve many masters all at the same time. Thus born is the multiple data base environment. Things may be all right if these multiple databases were created under a master plan that has been carefully thought out. But these multiple data bases are created in a random, willy nilly manner, where every user does their own thing with no regard for any other user. Across the many databases is redundant data. There are unsynchronized definitions. There are different structures of data. There are individualized refer- Page 4 of 12

ence tables. In short, there is absolutely no coordination or discipline of data whatsoever across these many different data bases that are built. Fig 2 shows that multiple databases are built everywhere. Soon there were multiple data bases everywhere Fig 2 In a short amount of time, the end user wakes up to find that data is scattered all over the landscape. The only way the end user has to try to make sense of data that is found in many places is to start to ship data from one environment to the next in the hope that by shipping the data to another database, there might be some rational basis for making corporate decisions. This discomfort with multiple occurrences of data in many places is the very first evidence that the user needed to have integrated data. Intuitively, the end user knows that something isn t right with having the same data occur with different values all over the corporation. As soon as a bad decision is made, the discomfort with lack of integration turns into a real case of agony. When there are many occurrences of the same element of data each with differing values, basic decision-making is greatly impaired. What is needed is discipline of access and update of data. This discipline is achieved through integration. The need for integration mandates a need for a well-defined and rigorously enforced system of record. There needs to be update that occurred in only one place for a given data element. The many systems of the corporation need to act in harmony as one system, not a collection of separate systems. What is needed is integration of data. Page 5 of 12

Data Movement Technology From a technology standpoint, some technology is designed to move data from one system to the next. This data movement software is the very first manifestation of the need for software that integrated data. It is noted however that data movement software satisfies only a few needs of the corporation. It is like offering a starving man soda crackers. You may quell a little bit of the starving man s hunger, but a diet of nothing more than soda crackers is not going to go far in satisfying a real hunger. Fig 3 shows that software and technology for the movement of data from one database to another appears. And shortly thereafter it was found that there was a need for trundling data from one data base to another Fig 3 After the organization has moved data around, it discovers that there is a lot more to integration than just the movement of data. Instead there is the need to have data that is timely accurate complete at the same level of granularity fitting in a compatible manner with other data. Merely moving data from one place to the next does not address any of these important issues. Instead what is needed is a way to transform Page 6 of 12

data as it is being moved. As data is being moved, there is the perfect opportunity to reshape the data into a compatible format and structure. There is then an opportunity to integrate data together. Soon there appears software that is capable of transforming data, as seen in Fig 4. There was the recognition of the need for full blown transformation Fig 4 Over time the transformation process focuses on a source and a target. The source is the place where data comes from. The target is the place where the data is going to. At first simple transformations were all that were required. But over time more and more functionality is required for transforming and integrating data. The types of transformation capabilities that are required included reformatting resequencing restructuring summarization supply of default values addition of time stamping change of DBMS or operating system aggregation logical reassignment of data values, and so forth. Page 7 of 12

Each of these functions of transformation is not difficult by itself. But all the requirements taken together create quite a bit of complexity to the process of transformation. Evolving Data Integration Needs Demand for new forms of transformation continues. The process of data integration has begun to evolve. The evolution is spawned by a growing awareness of the importance of integration. At first integration was simple. But over time more sophisticated needs arose. These evolving aspects of integration appear in response to the number of programs that needed to be integrated the collective diversity of functionality that needed to be accomplished, the number of sources of data that needed to be merged the different uses that the data was being exposed to the diversity of the types of data the diversity of the operating systems and the data base management systems, and so forth. One of the evolving requirements for integration is to address data quality at the point of transformation. Since data is being accessed anyway (as a part of transformation), it is possible to perform such simple activities as domain checking, range checking, and other forms of edit. Table lookup is another form of integration function that can be added. In addition, it has been observed that the point of transformation was an excellent opportunity to gather metadata. In a way, transformations are nothing but a sophisticated form of metadata. Page 8 of 12

Another form of transformation that has emerged is changed data capture or CDC. Changed data capture was originally the activity of reading log tapes to find out what transactions have transpired, and by extension has become the activity of identifying new, updated or deleted records in data bases often the most convenient and efficient way to figure out what data should be placed in the target environment. And then there is another rather complex transformation requirement. That requirement is for increased throughput. Some organizations have so much data to run through their transformation engines that the process of transformation has to be done in parallel. When transformations are run in parallel, the approximation of the speed of throughput could be estimated. If n is the length of time that it takes to process the load in a single machine, then if there are m machines that can operate in parallel, then the process of transformation could be cut as much as n/m. In order to achieve optimal parallelization, the organization had to spread the processing load evenly. Nevertheless, through parallelization, organizations address the processing of very large amounts of input through the transformation process. Fig 5 shows that parallelization comes to the world of integration in due time. Eventually some organizations needed parallel transformation Fig 5 Transformations of raw data into integrated data are commonplace, especially since data warehousing has become popular. For most organizations, the data warehouse is the place where corporate integrated data is stored. Legacy systems with their unintegrated data feed the corporate data warehouses with their integrated data. Page 9 of 12

The Cost Factor With this increase in the sophistication and functionality of transformation software and technology comes another increase. That increase comes in the form of the rise in price of the transformation and integration technology itself. What was once a fairly inexpensive proposition becomes an expensive proposition. Fig 6 shows the increase in the price tag of transformation and integration technology. $$ The cost of transformation rose very quickly Fig 6 With the increase in the cost of transformation and integration technology comes another phenomenon. Soon it is only the very largest of corporations that can afford the price tag that this technology cost. The problem is that transformation and integration technology is needed in all sorts of organizations large ones, small ones, and medium size ones. Open Source Changes the Economics of Data Integration It is into this marketplace that next generation data integration enters. Pervasive and scalable, these solutions enable organizations of all sizes to deploy data integration technology, without the usual restrictions imposed by the deployment costs of the traditional integration products. The benefits that open source bring to the data integration market are numerous: there is no barrier to adoption: anyone can download the solution, try it, and start implementing it on projects without control of the vendors the upfront and overall costs are low Page 10 of 12

payment is based on actual usage (no shelfware) there are no runtime costs, which does not hinder deployment (traditional solutions are priced per CPU on the source, target and runtime engines, restricting deployment) expertise is widely available in the community (no need to rely on the vendor s consultants) Talend provides such a solution. It is a serviceable transformation and integration tool. From a functionality point, Talend s data integration solutions offer the same features as older transformation technologies. But Talend operates in an open source model, where services and ancillary features are offered on a subscription basis. From an affordability standpoint, Talend opens up the marketplace for transformation and integration to all customers, regardless of size and data integration needs. Conclusion The onus today is clearly on opening up data integration. Data should be captured where it resides, Software as a Service applications become extensions of the information system, business intelligence systems become open to the outside world. Older data integration products had to contend with all the legacy needs, and they are cumbersome to use. They are based on outdated development technologies and methodologies, which renders their evolution slow. Conversely, Talend has developed a new data integration technology, a new approach, a new user-centric model. The open source approach can help reversing Moore s Law: instead of doubling power (and price) every year, the same power is available at half the price or even less compared to the year before. Open source in general, and Talend in particular, accelerate the pace of the market, by providing the most re- Page 11 of 12

cent technologies at a fraction of the total cost of ownership of proprietary products. They truly change the economics of data integration. About the Author Bill Inmon, world-renowned expert, speaker and author on data warehousing, is widely recognized as the "Father of Data Warehousing". He was also voted as "One of the Ten IT People Who Mattered in the Past Forty Years" by the Computer World Magazine's July 2007 issue. 2007 Inmon Consulting Services, Inc. All rights reserved. Page 12 of 12