Data Quality for PowerCenter Users: Expanding Beyond ETL. Marina Grebenkova Principal Product Manager Informatica

Similar documents
Informatica Data Quality Product Family

IBM InfoSphere Information Analyzer

IBM Software IBM InfoSphere Information Server for Data Quality

Data Quality Blueprint for Pentaho: Better Data Leads to Better Results. Charles Gaddy Director Global Sales & Alliances, Melissa Data

Oregon SQL Welcomes You to SQL Saturday Oregon

STEP Data Governance: At a Glance

What's New In Informatica Data Quality 9.0.1

FIRSTLOGIC DATA QUALITY MANAGEMENT FOR SIEBEL CRM & UCM

Improving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You

Dell Boomi Cloud MDM Overview

Tips and Tricks for Data Quality Management

Intro to BI Architecture Warren Sifre

Third generation of Data Virtualization

Data Quality in the MDM Ecosystem

Metadata Based Impact and Lineage Analysis Across Heterogeneous Metadata Sources

Making the Impossible Possible

Data Virtualization at. Nationwide. Nationwide. DAMA October 13, 2011

DATA QUALITY STRATEGY. Martin Rennhackkamp

MDM Partner Summit 2015 Oracle Enterprise Data Quality Overview & Roadmap

INDEPTH Network. Introduction to ETL. Tathagata Bhattacharjee ishare2 Support Team

Data Virtualization and the API Ecosystem

DATA STEWARDSHIP BODY OF KNOWLEDGE (DSBOK)

Business Intelligence. You can t manage what you can t measure. You can t measure what you can t describe. Ahsan Kabir

iway Software: Information Management and roadmap Transforming data into business value

WEBMETHODS AGILITY FOR THE DIGITAL ENTERPRISE WEBMETHODS. What you can expect from webmethods

Microsoft Implementing a Data Warehouse with Microsoft SQL Server 2014

Implementing a Data Warehouse with Microsoft SQL Server 2014

A Guide to Using Cisco Data Virtualization

Data Management Glossary

Luncheon Webinar Series January 13th, Free is Better Presented by Tony Curcio and Beate Porst Sponsored By:

Implementing a Data Warehouse with Microsoft SQL Server 2014 (20463D)

Fast Innovation requires Fast IT

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

Informatica Enterprise Information Catalog

Core Data Services: Basic Components for Establishing Business Value

Extending the Value of MDM Through Data Virtualization

INTRODUCTION TO DATA GOVERNANCE AND STEWARDSHIP

@Pentaho #BigDataWebSeries

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality

SAS IT Resource Management 3.8: Reporting Guide

Informatica Data Quality Upgrade. Marlene Simon, Practice Manager IPS Data Quality Vertical Informatica

Copyright 2016 Datalynx Pty Ltd. All rights reserved. Datalynx Enterprise Data Management Solution Catalogue

Experiences in Data Quality

Data Quality Architecture and Options

Realizing the Full Potential of MDM 1

Managing the Razor s Edge: Driving the value of Master Data Management (MDM) through technology and stewardship

Architects: Anchors or Accelerators to Organizational Agility?

SIEM Product Comparison

CA ERwin Data Modeler r8 Marketing & Sales Guide

ETL is No Longer King, Long Live SDD

Oracle Data Integration

Building a Data Warehouse: Data Quality is key for BI. Werner Daehn

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Metadata Management as a Key Component to Data Governance, Data Stewardship, and Data Quality Management. Wednesday, July 20 th 2016

ASG WHITE PAPER DATA INTELLIGENCE. ASG s Enterprise Data Intelligence Solutions: Data Lineage Diving Deeper

Data Sheet: Endpoint Security Symantec Network Access Control Starter Edition Simplified endpoint enforcement

Duration: 5 Days. EZY Intellect Pte. Ltd.,

Welcome to the Gathering Intelligence from your Applications and Data: The case for Oracle BI eseminar

Solving the Enterprise Data Dilemma

Considering a Services Approach for Data Quality

FEATURES BENEFITS SUPPORTED PLATFORMS. Reduce costs associated with testing data projects. Expedite time to market

PERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract

Analytics Fundamentals by Mark Peco

CA ERwin Data Profiler

<Insert Picture Here> Accelerated Java EE Development: The Oracle Way

Transforming the Internal IT Landscape with APIs. Scott Cranton Director, Application Platform SAs April 2018

PERSPECTIVE. Effective Data Governance. Abstract

Enterprise Data Catalog for Microsoft Azure Tutorial

Business Intelligence and Decision Support Systems

Keystone Program. Putting the Focus on Master Data. Perth PPDM Data Management Conference Sept. 2-3, 2009 Perth, W.A. AUS

Composite Data Virtualization Maximizing Value from Enterprise Data Warehouse Investments

Best Practices in Data Governance

Implementing a Data Warehouse with Microsoft SQL Server 2012

PERFORM FOR HPE CONTENT MANAGER

Experiences in Data Quality

DEV-33: Get to Know Your Data Open Source Data Integration, Business Intelligence and more Marian Edu

Hitachi Vantara Overview Pentaho 8.0 and 8.1 Roadmap. Pedro Alves

Reducing Costs and Risk with Enterprise Archiving

Hyperion Data Integration Management Adapter for Essbase. Sample Readme. Release

Enterprise Information Management with SQL Server 2016

MDM and Data Governance

CA ERwin Data Modeler r9 Rick Alaras N.A. Channel Account Manager

Data Governance Strategy

How to Evaluate the Accuracy of Address Records

Satisfy the Business Using Db2 Web Query

Optimizing Data Integration Solutions by Customizing the IBM InfoSphere Information Server Deployment Architecture IBM Redbooks Solution Guide

Additional License Authorizations

SAP BW 3.5 Enhanced Reporting Capabilities SAP AG

IBM DB2 Web Query for System i

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

1Z Oracle Business Intelligence (OBI) Foundation Suite 11g Essentials Exam Summary Syllabus Questions

Additional License Authorizations

Is Informatica available globally at the time of the launch?

Delivering information you can trust June IBM InfoSphere Information Server: Simplify integration with unified metadata

Business Impacts of Poor Data Quality: Building the Business Case

SIEM: Five Requirements that Solve the Bigger Business Issues

Luncheon Webinar Series April 25th, Governance for ETL Presented by Beate Porst Sponsored By:

Microsoft End to End Business Intelligence Boot Camp

SIEM Solutions from McAfee

Transcription:

Data Quality for PowerCenter Users: Expanding Beyond ETL Marina Grebenkova Principal Product Manager Informatica 2

Agenda Do you trust your data? What is Data Quality? Data Quality process How it complements Data Integration How does Data Quality fit into your Data Integration initiatives? 3

What Are Analysts Saying? Data profiling, data quality, and data integration are three business practices that go together like bread, peanut butter, and jam... TDWI Research 4

Managing Data in the Enterprise Applications Applications Data Warehouse Data Mart Legacy Systems Business Intelligence Data Mart External Sources MDM Governance Portal/ Dashboard 5

Do You Trust Your Data? Impact of Poor Data Quality Data Integration costs increased by 100% if Data Quality is overlooked Data Warehouse projects fail upwards of 50% of the time due to bad data Applications Only 19% of organizations are highly satisfied Data with Warehouse the quality of their data Legacy Systems Data Quality Affects MDM Overall Productivity External Sources by as Much as 20% Governance 36% of companies Data surveyed Mart estimated annual losses of more than $1m -- some Business as much as $100m Intelligence Data Mart Applications Project Delivery Times Increase by 35% due Portal/ to lack of Data Quality Dashboard 6

Data Quality Process and Components Delivering Authoritative and Trustworthy Data Line of Business Decisions are made with confidence 1. Discover and Analyse 6. Monitor Data Quality Versus Targets 5. Review Exceptions 2. Establish Metrics and Define Targets 4. Deploy Data Quality Services 3. Design and Implement Data Quality Rules Risk and compliance can be managed effectively Costs can be kept in check Data Stewards and Analysts IT The best possible customer service is provided 7

Why Data Quality? Capabilities Additional to Those in PowerCenter Business focused browser based UI Data profiling and discovery Unstructured data parsing and standardization Global address validation Fuzzy matching regardless of format or correctness Debugging through mid-stream profiling Exception management Proactive data quality monitoring 8

Data Profiling and Discovery Understanding Source Data and Identifying Anomalies Drill-down analysis Data Quality Scorecards Specify rule by example Data Analyst Increase productivity and efficiency by enabling the business to proactively take responsibility for data quality and reduce their reliance on IT 9

Parsing & Standardization Correct Completeness, Conformity and Consistency Problems Product ID Brand Description 90017 ipod 4GB, Red ipod Nano //Special Edt. Product_ID Brand Size Color Description 90017 IPOD 4GB Red 4 Gigabyte Nano Special Edition (Red) One environment to standardize and parse all data domains 10

Parsing & Standardization Natural Language Processing Polarity 11

Address Validation For over 240 countries against reference data from international postal agencies Address1 Address2 Address3 Address4 Address5 7887 KATY FRWY SUITE 333 HOUSTEN TX 99999 Street City County StateCode StateName ZIP ZIP4 Latitude Longitude 7887 Katy Freeway Suite 333 Houston Harris TX Texas 77024 2005 29.283427-95.46802 Valid addresses keep costs down and helps ensure compliance 12

Match and De-Duplicate Regardless of format or correctness SKU Description Size Price AP-2199 Sailors Desk Lamp 12 in 27.99 AP2199 Nautical Lamp 12 inch 27.99 PA-2119 Sailors Lamp 12 inch 34.99 Intrinsically wrong (and potentially uncorrectable) data can still be valuable for Matching purposes Alternate or Nicknames Misspellings Invalid Data Name DOB Address City State Zip W. S. Harrison II PhD 1/33/1967 Medical Center,117/2A #17497 Jackson E. Hartford NY 16987 William Stuart Harison 1/3/1967 117-2a Jacksen Rd. Easthartford CT 06987 William Stewart Harison 9/9/99 117 Jackson Road. Suite 2A Hartford East CT 06987 Doctor Bill Harisen jr 1/13/1967 117 Jacson Room 2a HartfordCT 6984 Harrisen William Doctor 2a Jackson Rd #174978 Hartford CT 06987-4573 Highly accurate matching ensures the minimum number of duplicate master records 13

Productive Development Environment With mid-stream profiling for IT developers Standardization, Parsing, Address Validation, Matching One click from profiling to rule configuration Nested mapplets DQ Developer Informatica Developer Seamless integration with PowerCenter Increased development productivity reduces operational costs 14

Mid-Stream Profiling Profile Data at Any Point Within a Mapping Any Source Any Transformation Any Rule or Mapplet 15

Exception Management Bad Record and Duplicate Record Correction Good: Committed Data Steward Target Source Validate, apply rule in same process Bad: Fix Suggested or Record Rejected Informatica Analyst Empower data stewards to directly manage data quality tasks and solve problems faster 16

Proactive Data Quality Monitoring Source Source Source Profile Mapping Scorecard Enable quick and targeted response as Data Quality issues arise Error Steward, Analyst, IT, etc. Automated error detection based on data quality issues 17

Data Quality Firewall Centralized, Reusable Rules BI Application Customer Service Portal Sales Automation Application For the business: Support data governance by enforcing consistent data quality rules across all applications. Centralized data quality rules Rules Rules Rules Rules For IT: Accelerate the deployment of common data quality rules across all applications. Reduce costs through reuse. Customer Order Product Invoice 18

What are the options? PowerCenter Data Quality Starter Kit Pre-defined Data Quality Rules for PowerCenter No IDQ install required - import and run Common Data Quality rules Zero learning curve PowerCenter DQ Developer Option Data Quality Developer Tool for PowerCenter Eclipse based DQ Developer interface Standardization, parsing, matching, validation for all data types Seamless integration with PowerCenter Full Use Data Quality and Profiling ALL Data Quality Capabilities and Interfaces Role based tools Standardization, parsing, matching, validation for all data types Profiling, scorecarding, exception management 19

Data Quality Starter Kit for PowerCenter DQ for PC Package 1. Extract Package 2. Import Metadata Tackle common Data Quality issues with NO install required and zero learning curve 20

Data Quality Developer for PowerCenter Create and maintain custom Data Quality rules Leverage and edit pre-defined rules 1. Author/edit rules in Data Quality Developer 3. Execute natively in PowerCenter 21

Full Use Data Quality and Profiling Developers Integrating profiling, cleansing and data services Configure mappings Run mappings Informatica Developer Data Stewards Business Analysts Line of Business Managers Informatica Analyst Profile and discover Configure rules Validate data against rules Manage reference data Configure scorecards Data Stewards Correct bad records Review, manage and consolidate duplicate records Exception Management 22

Call to Action Don t wait ensure your data is trustworthy with Informatica Data Quality. It s simple! 23

Questions? 24