CUSTOMER DATA INTEGRATION (CDI): PROJECTS IN OPERATIONAL ENVIRONMENTS (Practice-Oriented)

Similar documents
Accelerate Your Enterprise Private Cloud Initiative

Paper. Delivering Strong Security in a Hyperconverged Data Center Environment

Data Management Glossary

CRM-to-CRM Data Migration. CRM system. The CRM systems included Know What Data Will Map...3

Ready for an All-Flash Comeback? by George Crump

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance

The Mission of the Abu Dhabi Smart Solutions and Services Authority. Leading ADSSSA. By Michael J. Keegan

The Importance of Data Profiling

DATAWAREHOUSING AND ETL PROCESSES: An Explanatory Research

Full file at

Principles of Information Security, Fourth Edition. Chapter 1 Introduction to Information Security

The data quality trends report

Building a Data Warehouse step by step

Data Quality in the MDM Ecosystem

Advancing the MRJ project

TechValidate Survey Report: SaaS Application Trends and Challenges

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts.

THE STATE OF IT TRANSFORMATION FOR RETAIL

Homeless Management Information System (HMIS)

COBIT 5: ENABLING PROCESSES BY ISACA DOWNLOAD EBOOK : COBIT 5: ENABLING PROCESSES BY ISACA PDF

Security and resilience in Information Society: the European approach

ETL Testing Concepts:

Object-Oriented Systems. Development: Using the Unified Modeling Language

Data Mining & Data Warehouse

The challenges of the NIS directive from the viewpoint of the Vienna Hospital Association

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

The 360 Solution. July 24, 2014

HOW WELL DO YOU KNOW YOUR IT NETWORK? BRIEFING DOCUMENT

Results. Telefônica Brasil S.A

Data Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1)

Managing Returned Mail to Drive Non-Profit Donations

2016 Summary Annual Report and Annual Meeting Documents

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality

Evolution For Enterprises In A Cloud World

Introduction Mobile Broadband Market, HSPA+/LTE and frequency bands

Evolution of the ICT Field: New Requirements to Specialists

A framework for community safety and resilience

Categorizing Migrations

Making Data Warehouse Usable and Useful

Migrating from Pro/INTRALINK 3.x

Strategic White Paper

Transformation Through Innovation

Data Mining: Approach Towards The Accuracy Using Teradata!

W W W. M A X I M I Z E R. C O M

STEP Data Governance: At a Glance

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications

Best Practice for Creation and Maintenance of a SAS Infrastructure

CHANGING FACE OF MOBILITY RAISES THE STAKES FOR ENDPOINT DATA PROTECTION

ENISA EU Threat Landscape

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

CHAPTER 3 BUILDING ARCHITECTURAL DATA WAREHOUSE FOR CANCER DISEASE

Making the Impossible Possible

STRATEGIC PLAN. USF Emergency Management

Three Key Considerations for Your Public Cloud Infrastructure Strategy

Preparing your network for the next wave of innovation

A Distinctive View across the Continuum of Care with Oracle Healthcare Master Person Index ORACLE WHITE PAPER NOVEMBER 2015

Standard Setting and Revision Procedure

WHITE PAPER. Master Data s role in a data-driven organization DRIVEN BY DATA

Hype Cycle for Data Warehousing, 2003

Considerations of Analysis of Healthcare Claims Data

Build confidence in the cloud Best practice frameworks for cloud security

RESOLUTION 47 (Rev. Buenos Aires, 2017)

Your chapter will be assigned an address with ending. Password and directions will be ed at the time of set up.

How Turner Broadcasting can avoid the Seven Deadly Sins That. Can Cause a Data Warehouse Project to Fail. Robert Milton Underwood, Jr.

MAIL DIVERSION. This paper focuses primarily on the forecasted impact of technology on postal volumes

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

U.S. Japan Internet Economy Industry Forum Joint Statement October 2013 Keidanren The American Chamber of Commerce in Japan

SAP. Modeling Guide for PPF

Informatica Data Quality Product Family

Bus Circulator Feasibility Study Scope of Work March 12, 2018

Data Quality Architecture and Options

GUIDING PRINCIPLES I. BACKGROUND. 1 P a g e

June 2017 intel.com schneider-electric.com

U.S. National Security Space Strategy and Responsible Behavior in Space. Audrey M. Schaffer U.S. Department of Defense

The Future of Business Depends on Software Defined Storage (SDS) How SSDs can fit into and accelerate an SDS strategy

THE FUTURE OF BUSINESS DEPENDS ON SOFTWARE DEFINED STORAGE (SDS)

CA ARCserve Backup. Benefits. Overview. The CA Advantage

Hybrid WAN Operations: Extend Network Monitoring Across SD-WAN and Legacy WAN Infrastructure

Predicts 2004: The Future of Windows Server

2 The IBM Data Governance Unified Process

From Windows CE to an IoT RTOS. Microsoft s Evolution and the Rise of Real-Time Applications

That Set the Foundation for the Private Cloud

turning data into dollars

Data Warehousing. Seminar report. Submitted in partial fulfillment of the requirement for the award of degree Of Computer Science

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

The Ultimate Guide for Virtual Server Protection

Understand Your SAP Application Upgrade. The Challenge

I. Contact Information: Lynn Herrick Director, Technology Integration and Project Management Wayne County Department of Technology

What to Look for in a Partner for Software-Defined Data Center (SDDC)

Statement of Chief Richard Beary President of the International Association of Chiefs of Police

O&M Service for Sustainable Social Infrastructure

Submission to the International Integrated Reporting Council regarding the Consultation Draft of the International Integrated Reporting Framework

UAE National Space Policy Agenda Item 11; LSC April By: Space Policy and Regulations Directory

Minneapolis Clean Energy Partnership Energy Vision Advisory Committee (EVAC) 4/16/2015

The 5G consumer business case. An economic study of enhanced mobile broadband

Backup 2.0: Simply Better Data Protection

CHAPTER 3 Implementation of Data warehouse in Data Mining

Outlook Integration Guide

Transcription:

CUSTOMER DATA INTEGRATION (CDI): PROJECTS IN OPERATIONAL ENVIRONMENTS (Practice-Oriented) Flávio de Almeida Pires Assesso Engenharia de Sistemas Ltda flavio@assesso.com.br Abstract. To counter the results of increasing globalization and competition, companies have moved from a primarily product focus to a customer focus. This has elevated the issues concerning Data Quality to a high priority level among corporations, concerned as they are with the need for accuracy in dealing with their customers. This article presents a framework to be used in the assembly of a Customer Information Database (CID) from information generated by different legacy systems, and in the implementation of Data Quality processes to control and maintain the quality of this kind of information, in those cases in which the CID should run in complex operational environments, requiring high level of quality and performance. BACKGROUND This paper is based on the experience acquired by Assesso, a Brazilian technology enterprise that has been working with Data Quality and CID Assembly projects for more than 18 years. During this period of time, Assesso developed more than 35 projects in these areas, always for large companies, and always dealing with a substantial volume of information. Based on the experience acquired, Assesso has developed and successfully markets a software for data treatment and unification that has become one of the market leaders in Brazil, and has also established a framework for CDI project development, effectively and efficiently used in its projects. During this past year, both the framework and the software have been adapted to include the Total Data Quality Management (TDQM) principles, proposed by the Massachusetts Institute of Technology (MIT), and also to incorporate some locally developed innovative concepts and procedures that we believe will contribute to enhanced data quality processes as a whole. INTRODUCTION In the mid 1980s, companies started to change their focus strategy from a Product vision to a Customer vision. In order to adapt to this new situation, Information Technology (IT) areas adopted the solution of implementing parallel databases. Initially, these were for marketing purposes and, afterwards, for business performance analysis. However, the legacy systems were kept product or process oriented. 1

It was at this point that Database Marketing projects were put together. These were followed by Data Warehouse projects, which, in addition to having independent databases, were run in a separated environment to prevent the risk of any interference with the operational processes of the companies. These projects dictated the creation of the corporate customers databases as well as the importance of data quality. It was as a result of the need to implement this that the cleansing and ETL (extraction, transformation and loading) software emerged and/or was strengthened. Customers deduplication processes also gained great importance, as this was the basis of assembly of a complete profile of clients, which was, after all, the main objective of this kind of project. Deduplication processes have always had, as a result, four levels of answer possibilities: These customers certainly are the same; These customers certainly are different; A grayish zone, where an automatic identification of a likely duplicate is not possible; And, eventually, records, where the necessary attributes for the process are not present, whether because they have not been filled in, or because they have information containing unrecoverable errors a condition detected by several consistency processes (domain, digit check and others). The current market practice for Database Marketing or Data Warehouse implementation projects has been to establish, case by case, the level of rigidity of the deduplication process to be used, adjusting it to the objectives of each particular project. For example, for Database Marketing projects, (i.e. a direct mail campaign), an overkill process was normally used, even though this could generate improper deduplications. Its primary value was that it assured lower costs for the campaign, particularly when what was being sent had some material value. On the other hand, for Data Warehouse projects, an underkill process was used, minimizing improper deduplication, even running the risk of not deduplicating some cases, because by doing this the distortions in the business analysis to be carried out on the unified data base were minimized. In practice, the result was the elimination of the grayish area described above, therefore also eliminating the need for back-office additional work, since a certain level of errors was an accepted part of the process. This situation changed with the consolidation of the focus on the customer concept, and with the launching of CRM (Customer Relationship Management) programs. More than a transitional trend, this has become a permanent tendency: Management techniques come and go, but CRM is not a trend we are sure of that. Even though it is difficult to implement, CRM is a powerful idea. Regarding Data Quality, already in 2001, a Strategic Analysis Report (SAR Nov 26, 2001) by Gartner Group stated: A clean, well-integrated data repository is a prerequisite for CRM success.... Integrating customer data is a challenging task, however, because merely assembling the data in one place is not enough. A number of steps must be performed, including data extraction, transformation, cleaning and consolidation. In reality, this new perception has substantially changed the quality levels required for the assembly and maintenance of Customer Information Database (CID). Even small percentages of errors were no longer acceptable. For the deduplication cases, how to accept improper unification of two distinctive clients, and send invoices and products to the wrong individual? Or how to accept the non-unification of several registers of one client, and deal with this client as if he were different person? In addition to the new and high quality levels established, the challenge became even greater with the consolidation of another perception as well: the need for an updated vision of the customer. This means the need of integrating the CID to the operational environment, with real time updates, and meeting performance levels that fit the requirements of this type of environment. Within this new reality, and in order to fit these new requirements, CID s implementation processes were forced to change. In our case and in our framework, one of the changes was the adoption of a new concept, or a new goal, that we called Seeking 100% Quality. 2

The first understanding was that we would not be able to meet that goal by using only automated processes. We needed to separate the wheat from the chaff and give a special treatment for the grayish zone and for the registers containing erroneous attributes. Thus, we quickly started to strongly recommend to our clients to assembly a back-office area to treat theses cases. Additionally, we encouraged them to appoint a quality manager for the process, a professional with the specific task of monitoring and controlling data quality, aiming at its continuous improvement. In order for our recommendations to be successful and economically feasible, a refinement of our processes was necessary, so that the workload of the back-office area would be minimized. It was at this point, initially, that we adopted the RYG Concept (Red, Yellow, Green), defined below: by bringing together several deduplication rules and processes, it was possible to adequately group the cases of each of the four possible answers of the process, according to the specifications mentioned above: Red The necessary attributes for the deduplication have errors (fields that have not been filled in and/or with inconsistent information) These cases should be forwarded to the backoffice area, and contacting the client directly will be necessary in order to correct them; Yellow The automatic process of deduplication is not capable of assuring that it is a duplicate. However there is a significant chance that this is the case ( Suspects of Duplication the grayish area mentioned above) These cases should be forwarded for visual analysis by the back-office area, and contacting the client may become necessary; Green The automatic process is capable of assuring that it is a duplicate case, or, on the contrary, that it definitively is not a case of duplication These cases are then regularly used or updated in the process. By using the RYG Concept, our challenge was to make the process maximize the Green cases and minimize the Yellow cases (we have less control over the Red cases, which basically depend on the original quality of the file being treated here our actions are focused on the recommendations for improvement of the information capture processes). This step was followed by the acknowledgement that the RYG Concept is generic, considering that it can be efficacious in the treatment of general attributes, of sets of attributes, or of business rules. Afterwards, we adopted it in our entire framework as something definite. Some examples: Date of birth (age) After today or age greater than 100 years old Red; age under than 18 or greater than 80 years old Yellow; Address the informed ZIP code does not belong to the city Red ; ZIP code is within city range, however the street was not recognized Yellow; Life Insurance - above US$ 3,000,000 (not accepted by the company) Red; people under 18 years old and with insurance above US$ 100,000 - Yellow. The framework described below contemplates the necessary adaptations for these changes, as well as the adoption of TDQM principles proposed by MIT. CHARACTERISTICS OF CDI PROJECTS IN OPERATIONAL ENVIRONMENTS CDI projects aiming at the CID Customer Information Databases - implementation in operational environments normally fall under one of the two cases described below: Where a first step for CID assembly is forecast and, afterwards, all legacy systems are adapted for the utilization of the implemented CID; Where we still have a first step for CID assembly, but the redevelopment of all legacy systems is forecast, now designed for the customers vision. 3

Therefore, in both cases, normally, the first step is the CID implementation, which is done at once, followed by the gradual implementation of each of the legacy systems either adapted or redeveloped. In this article, we are not dealing with legacy system adaptations or redevelopments. However, they should incorporate all the rules and solutions defined herein, and the framework also presents some warnings in respect of their adaptation or redevelopment. Rather, this article is focused on business to consumer projects. For business-to-business projects, whereas the framework is the same, several other specific peculiarities have to be considered. FRAMEWORK FOR OPERATIONAL CDI PROJECTS The chart bellow gives a quick view of the framework developed by Assesso for the implementation of CDI Projects in operational environments. Diagnosis Architecture and Building Implementation and Maintenance Database Modeling Sources and Data Extractors Definition Individual Sources General Deduplication Household Definition Fusion General Business Rules Definition Third Level of Data Final tests Individual Attribute Evaluation First Level of Data Individual Business Rules Definition Second Level of Data Individual Source Deduplication Definition and Development of Processes for Monitoring Data Quality in the Production Environment As can be seen above, the framework is divided in three large steps, each of it with its specifics objectives and products, as follows: Diagnosis Products: Scope definition, schedules, costs and benefits of the project. In this step, in addition to the complete understanding of the business, the following issues should be analyzed: project objectives, the company s short, medium and long term strategies, risks and opportunities, business processes, involved systems platforms, operating environments, information sources, volumes, business rules, new environment macro specification (including the necessary adaptation for current systems and new developments), and the benefits of the new environment. 4

Architecture and Building Products: Project specification, development and tests, definition of processes and end-user homologation rules. Aiming at having a more efficient organization of the work and mainly at the possibility of carrying out tasks simultaneously, this extensive step was divided in eight sub-steps. For this entire step and for each of its sub-steps, we adopted the cycles defined by TDQM by MIT: define, measure, analyze and improve. As such, each of the sub-steps is a sequence of cycles that ends with end-user homologation. An important point to be stressed is that in this step the tasks are carried out on the total of available information and never samples. Our experience has shown that analysis performed on samples can frequently hide errors such as filling-in bad habits ( easy typing ), bugs that legacy systems carried for a period of time, attributes to be used for deduplication purposes that were not properly populated, or other similar difficulties. These eight sub-steps are a basic guideline for CDI projects in operational environments. For every new project, this framework is reevaluated and adapted to its specific conditions. For more information and details about these eight sub-steps, please contact Assesso. Implementation and Maintenance The points to be stressed in this step are as follows: The need to create an area to be in charge of data quality for the company; This area should regard information as a product and should have the responsibility and the empowerment to modify the existing processes, if needed, in each of the three areas involved, according to TDQM from MIT: Data Collectors, Data Custodians and Data Users (Customers). Data quality monitoring and assurance processes defined in the previous step to be regularly run and adapted to new situations. Other items related to this step should follow normal standards, which are defined by each company s IT. CONCLUSION As a result of market evolution, there is currently and will be a growing demand for the implementation of Data Quality processes in operational environments. This will securely lead to new requirements for the processes, such as higher quality and performance levels, which are forcing companies to adopt new frameworks to their development. These new requirements are not yet stabilized; therefore the framework to be adopted should be flexible enough to incorporate further evolution. 5