Data Quality for PowerCenter Users: Expanding Beyond ETL Marina Grebenkova Principal Product Manager Informatica 2
Agenda Do you trust your data? What is Data Quality? Data Quality process How it complements Data Integration How does Data Quality fit into your Data Integration initiatives? 3
What Are Analysts Saying? Data profiling, data quality, and data integration are three business practices that go together like bread, peanut butter, and jam... TDWI Research 4
Managing Data in the Enterprise Applications Applications Data Warehouse Data Mart Legacy Systems Business Intelligence Data Mart External Sources MDM Governance Portal/ Dashboard 5
Do You Trust Your Data? Impact of Poor Data Quality Data Integration costs increased by 100% if Data Quality is overlooked Data Warehouse projects fail upwards of 50% of the time due to bad data Applications Only 19% of organizations are highly satisfied Data with Warehouse the quality of their data Legacy Systems Data Quality Affects MDM Overall Productivity External Sources by as Much as 20% Governance 36% of companies Data surveyed Mart estimated annual losses of more than $1m -- some Business as much as $100m Intelligence Data Mart Applications Project Delivery Times Increase by 35% due Portal/ to lack of Data Quality Dashboard 6
Data Quality Process and Components Delivering Authoritative and Trustworthy Data Line of Business Decisions are made with confidence 1. Discover and Analyse 6. Monitor Data Quality Versus Targets 5. Review Exceptions 2. Establish Metrics and Define Targets 4. Deploy Data Quality Services 3. Design and Implement Data Quality Rules Risk and compliance can be managed effectively Costs can be kept in check Data Stewards and Analysts IT The best possible customer service is provided 7
Why Data Quality? Capabilities Additional to Those in PowerCenter Business focused browser based UI Data profiling and discovery Unstructured data parsing and standardization Global address validation Fuzzy matching regardless of format or correctness Debugging through mid-stream profiling Exception management Proactive data quality monitoring 8
Data Profiling and Discovery Understanding Source Data and Identifying Anomalies Drill-down analysis Data Quality Scorecards Specify rule by example Data Analyst Increase productivity and efficiency by enabling the business to proactively take responsibility for data quality and reduce their reliance on IT 9
Parsing & Standardization Correct Completeness, Conformity and Consistency Problems Product ID Brand Description 90017 ipod 4GB, Red ipod Nano //Special Edt. Product_ID Brand Size Color Description 90017 IPOD 4GB Red 4 Gigabyte Nano Special Edition (Red) One environment to standardize and parse all data domains 10
Parsing & Standardization Natural Language Processing Polarity 11
Address Validation For over 240 countries against reference data from international postal agencies Address1 Address2 Address3 Address4 Address5 7887 KATY FRWY SUITE 333 HOUSTEN TX 99999 Street City County StateCode StateName ZIP ZIP4 Latitude Longitude 7887 Katy Freeway Suite 333 Houston Harris TX Texas 77024 2005 29.283427-95.46802 Valid addresses keep costs down and helps ensure compliance 12
Match and De-Duplicate Regardless of format or correctness SKU Description Size Price AP-2199 Sailors Desk Lamp 12 in 27.99 AP2199 Nautical Lamp 12 inch 27.99 PA-2119 Sailors Lamp 12 inch 34.99 Intrinsically wrong (and potentially uncorrectable) data can still be valuable for Matching purposes Alternate or Nicknames Misspellings Invalid Data Name DOB Address City State Zip W. S. Harrison II PhD 1/33/1967 Medical Center,117/2A #17497 Jackson E. Hartford NY 16987 William Stuart Harison 1/3/1967 117-2a Jacksen Rd. Easthartford CT 06987 William Stewart Harison 9/9/99 117 Jackson Road. Suite 2A Hartford East CT 06987 Doctor Bill Harisen jr 1/13/1967 117 Jacson Room 2a HartfordCT 6984 Harrisen William Doctor 2a Jackson Rd #174978 Hartford CT 06987-4573 Highly accurate matching ensures the minimum number of duplicate master records 13
Productive Development Environment With mid-stream profiling for IT developers Standardization, Parsing, Address Validation, Matching One click from profiling to rule configuration Nested mapplets DQ Developer Informatica Developer Seamless integration with PowerCenter Increased development productivity reduces operational costs 14
Mid-Stream Profiling Profile Data at Any Point Within a Mapping Any Source Any Transformation Any Rule or Mapplet 15
Exception Management Bad Record and Duplicate Record Correction Good: Committed Data Steward Target Source Validate, apply rule in same process Bad: Fix Suggested or Record Rejected Informatica Analyst Empower data stewards to directly manage data quality tasks and solve problems faster 16
Proactive Data Quality Monitoring Source Source Source Profile Mapping Scorecard Enable quick and targeted response as Data Quality issues arise Error Steward, Analyst, IT, etc. Automated error detection based on data quality issues 17
Data Quality Firewall Centralized, Reusable Rules BI Application Customer Service Portal Sales Automation Application For the business: Support data governance by enforcing consistent data quality rules across all applications. Centralized data quality rules Rules Rules Rules Rules For IT: Accelerate the deployment of common data quality rules across all applications. Reduce costs through reuse. Customer Order Product Invoice 18
What are the options? PowerCenter Data Quality Starter Kit Pre-defined Data Quality Rules for PowerCenter No IDQ install required - import and run Common Data Quality rules Zero learning curve PowerCenter DQ Developer Option Data Quality Developer Tool for PowerCenter Eclipse based DQ Developer interface Standardization, parsing, matching, validation for all data types Seamless integration with PowerCenter Full Use Data Quality and Profiling ALL Data Quality Capabilities and Interfaces Role based tools Standardization, parsing, matching, validation for all data types Profiling, scorecarding, exception management 19
Data Quality Starter Kit for PowerCenter DQ for PC Package 1. Extract Package 2. Import Metadata Tackle common Data Quality issues with NO install required and zero learning curve 20
Data Quality Developer for PowerCenter Create and maintain custom Data Quality rules Leverage and edit pre-defined rules 1. Author/edit rules in Data Quality Developer 3. Execute natively in PowerCenter 21
Full Use Data Quality and Profiling Developers Integrating profiling, cleansing and data services Configure mappings Run mappings Informatica Developer Data Stewards Business Analysts Line of Business Managers Informatica Analyst Profile and discover Configure rules Validate data against rules Manage reference data Configure scorecards Data Stewards Correct bad records Review, manage and consolidate duplicate records Exception Management 22
Call to Action Don t wait ensure your data is trustworthy with Informatica Data Quality. It s simple! 23
Questions? 24