Data Warehousing. Metadata Management. Spring Term 2016 Dr. Andreas Geppert Spring Term 2016 Slide 1

Similar documents
After completing this course, participants will be able to:

METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

CHAPTER 3 Implementation of Data warehouse in Data Mining

FedDW Global Schema Architect

Seminar Datenbanksysteme

Using SLE for creation of Data Warehouses

Information Management course

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Developing Applications with Business Intelligence Beans and Oracle9i JDeveloper: Our Experience. IOUG 2003 Paper 406

What is a Data Model?

Meta-Data Support for Data Transformations Using Microsoft Repository

CA ERwin Data Modeler r7.3

Oracle Database 11g: Data Warehousing Fundamentals

Page 1. Oracle9i OLAP. Agenda. Mary Rehus Sales Consultant Patrick Larkin Vice President, Oracle Consulting. Oracle Corporation. Business Intelligence

Chris Claterbos, Vlamis Software Solutions, Inc. David Fuston, Vlamis Software Solutions, Inc.

Microsoft End to End Business Intelligence Boot Camp

IBM Industry Model support for a data lake architecture

Handout 12 Data Warehousing and Analytics.

Creating a target user and module

Chris Claterbos, Vlamis Software Solutions, Inc.

Data Warehouse Testing. By: Rakesh Kumar Sharma

Teiid Designer User Guide 7.5.0

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

MS-55045: Microsoft End to End Business Intelligence Boot Camp

Implementing and Maintaining Microsoft SQL Server 2005 Analysis Services

to-end Solution Using OWB and JDeveloper to Analyze Your Data Warehouse

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

SAS Data Integration Studio 3.3. User s Guide

Data Integration and ETL with Oracle Warehouse Builder

SAMPLE. Preface xi 1 Introducting Microsoft Analysis Services 1

BI/DWH Test specifics

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran

Data Warehouse and Data Mining

IMPLEMENTING STATISTICAL DOMAIN DATABASES IN POLAND. OPPORTUNITIES AND THREATS. Central Statistical Office in Poland

Enterprise Data Catalog for Microsoft Azure Tutorial

Information Integration

Using Oracle9i Warehouse Builder and Oracle 9i to create OLAP ready Warehouses

Metadata Flow in a Multi-Vendor Enterprise Toolset Focus Area Session Code: AFM55SN

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

The Data Organization

Java Metadata Interface (JMI)

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

SERES: ASEMANTICREGISTRY FOR ENTERPRISE SERVICES. Geir Jevne 9.juni 2011

Managing Metadata with Oracle Data Integrator. An Oracle Data Integrator Technical Brief Updated December 2006

Search Engines Chapter 2 Architecture Felix Naumann

Data Warehouse and Data Mining

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

A MAS Based ETL Approach for Complex Data

Full file at

Data Warehousing Introduction. Toon Calders

Take P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22

CA ERwin Data Modeler

Safe Harbor Statement

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

6 SSIS Expressions SSIS Parameters Usage Control Flow Breakpoints Data Flow Data Viewers

Migrating Express Applications To Oracle 9i A Practical Guide

Question Bank. 4) It is the source of information later delivered to data marts.

Oracle BI 11g R1: Build Repositories

HUMIT Interactive Data Integration in a Data Lake System for the Life Sciences

The process of metadata modelling in industrial data warehouse environments

1. Übungsblatt. Vorlesung Embedded System Security SS 2017 Trusted Computing Konzepte. Beispiellösung

DEV-33: Get to Know Your Data Open Source Data Integration, Business Intelligence and more Marian Edu

Data Warehousing & Mining Techniques

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

Sql Fact Constellation Schema In Data Warehouse With Example

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

Measuring the functional size of a data warehouse application using COSMIC-FFP

1Z0-526

Data Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1)

FXD A new exchange format for fault symptom descriptions

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Metadata Based Impact and Lineage Analysis Across Heterogeneous Metadata Sources

Conceptual modeling for ETL

Data Science. Data Analyst. Data Scientist. Data Architect

Module 9: Managing Schema Objects

Implement a Data Warehouse with Microsoft SQL Server

TIM 50 - Business Information Systems

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

InfoSphere Warehouse V9.5 Exam.

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

NIEM. National. Information. Exchange Model. NIEM and Information Exchanges. <Insert Picture Here> Deploy. Requirements. Model Data.

USB. USB Sticks in Design und Qualität

The InfoLibrarian Metadata Appliance Automated Cataloging System for your IT infrastructure.

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model

20466C - Version: 1. Implementing Data Models and Reports with Microsoft SQL Server

Monitoring and Operating Cisco Prime Service Catalog Reports

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

Data Warehousing & Mining Techniques

Modelling in Enterprise Architecture. MSc Business Information Systems

A Standard for Representing Multidimensional Properties: The Common Warehouse Metamodel (CWM)

Das Seminar kann zur Vorbereitung auf die Zertifizierung als Microsoft Certified Solutions Developer (MCSD): SharePoint Applications genutzt werden.

Oracle BI 11g R1: Build Repositories Course OR102; 5 Days, Instructor-led

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Microsoft SQL Server Training Course Catalogue. Learning Solutions

MICROSOFT BUSINESS INTELLIGENCE (MSBI: SSIS, SSRS and SSAS)

Transcription:

Data Warehousing Metadata Management Spring Term 2016 Dr. Andreas Geppert geppert@acm.org Spring Term 2016 Slide 1

Outline of the Course Introduction DWH Architecture DWH-Design and multi-dimensional data models Extract, Transform, Load (ETL) Metadata Data Quality Analytic Applications and Business Intelligence Implementation and Performance Spring Term 2016 Slide 2

Content 1. Motivation 2. Metadata 3. Metadata Creation and Usage 4. Metadata Management System Architecture 5. Appendix: DWH Metadata Standards Spring Term 2016 Slide 3

Motivation Metadata management is the prerequisite for two major objectives: 1. Enable and minimize effort for the development and operations of a data warehouse 2. enable optimal and correct information extraction data analysis: requires known and agreed semantics of data uniform terminology data quality; quality definition, quality checks, correction rules, quality requirements Spring Term 2016 Slide 4

Metadata "data about data": data that describe other data Meaning/semantics, syntax, etc. "any kind of information that is needed for the design, development and use of an information system" (Bauer & Günzel) metadata management: gathering, storage, administration, and providing of metadata storage in an own information system -> Repository Spring Term 2016 Slide 5

Instance and Metadata Meta meta models Meta models define metadata schema concept "table", "attribute" Metadata (data / object model) represent schemas, types Type/class "Customer", attribute "Main Street" Instance data, object data represent elements of the UoD and their properties Customer "John Doe", address "Main Street" 0 1 3 2 meta meta models meta models metadata data Spring Term 2016 Slide 6

Metadata: Examples in the DWH-Architecture source data models, Data Integration, Historization Ownership Sources data models, quality Reusable rules Selection, Aggregation, Calculation Selection, Aggregation, Calculation analysis logic, calculations, Reporting, terminology OLAP, Data Mining Web/App Servers reports GUI transformations transformations Datenmodelle data models Landing Zone staging structures Staging Area Subject Matter Areas calculations Reusable Measures & Dimensions Areas Metadata Management Spring Term 2016 Slide 7

Requirements to Metadata Management Availability and existence metadata must exist (e.g., data models of sources and the DWH) Automated capture metadata must (whenever possible) be captured and updated in the course of DWH development documentation after the fact ("Nachdokumentation") development plus subsequent documentation in the metadata repository does not work, metadata and code become inconsistent very fast Integration metadata must be integrated and provide an end-to-end view relationships between metadata (of different types) must be captured, managed, and used for instance, transformation rules are not independent metadata, they describe the mapping of a schema onto another one without seamlessly integrated metadata it is impossible to support impact analysis and data lineage Spring Term 2016 Slide 8

Requirements to Metadata Management (2) User access metadata support for all users and user groups adequate user interfaces, languages, etc. tool support interoperability metadata exchange between different systems should be possible Spring Term 2016 Slide 9

Metadata: Classification: Creation and usage time design metadata schema definitions/data models, transformation rules, data quality rules, terms and definitions runtime metadata log files, statistics, quality check results usage metadata usage frequencies, access patterns Spring Term 2016 Slide 10

Metadata: Classification (2) User technical metadata used by developers, administrators (IT staff in general) data dictionaries, database schemas, transformation rule code business metadata definitions of terms, calculation formulas Type metadata about primary data metadata about data in source systems, data warehouse, data marts process metadata rules and transformations of ETL-processes logs, execution plans Spring Term 2016 Slide 11

Metadata: Classification (3) Origin tools schema designer, ETL-tool,... sources users Abstraction conceptual abstract description, implementation independent understandable for users logical description in formal language e.g., database schema, formulas physical implementation e.g. SQL-code Spring Term 2016 Slide 12

Content 1. Motivation 2. Metadata 3. Metadata Creation and Usage 4. Metadata Management System Architecture 5. Appendix: DWH Metadata Standards Spring Term 2016 Slide 13

Metadata Usage passive use: documentation of various aspects of a DWH. Used by users, developers, administrators active use: interpretation of metadata through tools (transformation rules, quality rules) metadata-driven processes semi-active use: use of metadata through tools in order to check something (e.g., schema definitions) Spring Term 2016 Slide 14

Metadata Use: Build Time analysis application development process analyze info req derive data req determine reuse potential determine technology model data mart model mappings integration application development process analyze data req analyze source def. incom. interf. extend staging area model SMA model mappings define quality checks design SMA design ETL implement SMA implement ETL def. outgoing interfaces implement interfaces design logical schema design mappings implement data mart implement ETL process develop reports Spring Term 2016 Slide 15

Metadata Use: Build Time (2) analysis of required data: requirements source analysis: requirements, source metadata (data models) definition of "incoming" interfaces: interface contracts extension of staging area: logical data model modeling of SMA-structures: conceptual data model modeling of mappings: conceptual ETL-model definition of quality rules: quality rules logical modeling of SMA-structures, SMA- and ETL-implementation: logical and physical data and ETL-models, job nets definition of "outgoing" interfaces: interface contracts Spring Term 2016 Slide 16

Metadata Use: Build Time (3) Analysis of information requirements: requirements business metadata/glossaries, ownership Derivation of required data: requirements, analysis model identification of reuse potential of dimensions and measures: application portfolio business intelligence (application) strategy data mart modeling: conceptual data model, roles, access rules, data quality requirements and rules modeling of mappings: conceptual ETL-model logical modeling and implementation of data mart and transformations: logical and physical data and process models, job nets report development: semantic layer, business metadata/glossaries Spring Term 2016 Slide 17

End-to-end Metadata Use: Build Time impact analysis determines the impact of changes in one place to downstream components it thus requires an end-to-end view of metadata in particular data models and transformations must be complete development techniques such as generation or derivation of lower-level artifacts out of higher-level (more abstract) ones foster impact analysis development techniques that do not have an integral treatment of metadata compromise impact analysis (in particular, stored procedures) Spring Term 2016 Slide 18

End-to-end Metadata Use: Impact Analysis Layered Architecture Data Sources Reference/ Master Data Federated Integration Staging Area Domain Integration and Enrichment Integration, Aggregation, Calculation Data Marts Selection, Aggregation, Calculation Reporting and Analysis Services Reporting, OLAP, Data Mining Front End GUI integration enrichment (Meta)data Management Legend: relational database multidimensional database file logic; extract, transform, load logic (no ETL) data flow Spring Term 2016 Slide 19

Metadata Use: Runtime metadata use depends on the kind of user access ad-hoc querying: user must first formulate query use metadata to determine the data to analyze and query (data models, semantic layer) possibly use metadata search engine to determine data to query reporting: users consume generated reports use metadata about reports (report metadata, semantic layer) OLAP: querying and analyzing multidimensional data structures data models, KPI definitions Spring Term 2016 Slide 20

Metadata Use: Runtime report on various aspects of the DWH (metadata reporting) DWH content: DWH data models, tables, table rows, transformations, reports,... DWH processing: report on DWH load processes (number of load jobs, number of feeder files processed, number of rows loaded, etc.) DWH usage: number of reports consumed, (number of) queries, n most complex queries, etc. DWH security: number of users, roles, privileges, number of logins, number of failed login attempts,... data quality reports: number of DQ rules, checks, and DQ-issues DWH performance: database size, response times, storage and server utilization,... Spring Term 2016 Slide 21

Content 1. Motivation 2. Metadata 3. Metadata Creation and Usage 4. Metadata Management System Architecture 5. Appendix: DWH Metadata Standards Spring Term 2016 Slide 22

Metadata Management Architecture: Centralized single, centralized metadata management (system) usually only feasible if all components are from the same vendor (vs. best of breed) Design Tools Management Tools ETL Tools Analysis Tools Metadata Manager Rep. Spring Term 2016 Slide 23

Metadata Management Architecture: decentralized multiple metadata management systems (e.g. tool-specific) bilateral metadata exchange (max. n*(n-1)/2 interfaces) Design Tools Management Tools ETL Tools Analysis Tools MDM MDM MDM MDM Rep. Rep. Rep. Rep. Spring Term 2016 Slide 24

Metadata Management Architecture: DWH-like "DWH-approach": autonomous metadata management systems and repositories are responsible for local tasks metadata are integrated in "global" metadata DWH, which is responsible for all tasks requiring integrated metadata Design Tools Management Tools ETL Tools Analysis Tools MDM MDM MDM MDM Rep. Rep. Rep. Rep. Metadata Manager Rep. Spring Term 2016 Slide 25

Metadata Management Architecture: DWH-like Metadata Integration, Historization Reusable Selection, Aggregation, Calculation Selection, Aggregation, Calculation Metadata Reporting, OLAP, Data Mining Web/App Servers GUI Landing Zone Staging Area Design, Modeling, and ETL Tools, DBS Subject Matter Areas Data models ETL Data Quality Business Terms Reusable Measures & Dimensions Areas Data Sources #data models; data model mismatches; stability Metadata Management Integration, Historization Reusable Selection, Aggregation, Calculation Selection, Aggregation, Calculation metadata reports and analysis Reporting, OLAP, Data Mining Web/App Servers GUI Landing Zone Staging Area Spring Term 2016 Slide 26

Example Source Metadata: Schema of a DB- Repository database catalog contains database metadata schema of the catalog defines metadata structures (meta meta model) DB2: schema syscat, contains ~ 70 catalog views Oracle: schema user sys, owns > 3000 views constraint table column trigger index Spring Term 2016 Slide 27

Metadata about Metadata: Schema of a Repository s. Marco 2000 Spring Term 2016 Slide 28

Summary Metadata are created and used in all phases and layers of data warehousing Complete and adequate metadata management is a success factor Integrated metadata management can be a challenge in a heterogeneous environment (i.e., best-of-breed) Spring Term 2016 Slide 29

Content 1. Motivation 2. Metadata 3. Metadata Creation and Usage 4. Metadata Management System Architecture 5. Appendix: DWH Metadata Standards Spring Term 2016 Slide 30

Metadatenstandards Open Information Model (OIM) Version 1.0 definiert in 1999 durch Metadata Coalition (MDC) Common Warehouse Model (CWM) erste Version definiert in 1999 durch Object Management Group (OMG) einfacher Austausch von DWH-Metadaten zwischen Werkzeugen und Repositorien Modularität, so dass auch nur relevante Teile des Models implementiert werden können Spring Term 2016 Slide 31

CWM: Struktur CWM erlaubt Repräsentation von Metadaten über... Quellen, Targets, und Transformationen Analysen Prozesse und Operationen, die Warehouse-Daten erzeugen und verwalten sowie Lineage der Verwendung erlauben CWM basiert auf UML weitgehende Wiederverwendung des Object Models (Teil von UML) CWM verwendet UML-Packages und eine hierarchische Package- Struktur aus Gründen der Komplexität, Verständnis und Wiederverwendbarkeit Spring Term 2016 Slide 32

CWM: Struktur jedes Paket kann Pakete der gleichen Schicht oder der unteren Schichten referenzieren Spring Term 2016 Slide 33

CWM: Layer "Foundation" "Foundation" bietet CWM-spezifische Dienste für andere Packages auf höheren Schichten Data Types: Klassen und Assoziationen für die Definition von Datentypen Expressions: K&A für die Repräsentation von Ausdrucksbäumen Keys and Indexes: K&A, die Schlüssel und Indexe repräsentieren Software Deployment: K&A, mit denen repräsentiert werden kann, wie Software in einem DWH "deployed" wird Type Mapping: K&A für die Abbildung von Datentypen zwischen verschiedenen Systemen Spring Term 2016 Slide 34

CWM: Layer "Ressourcen" Relational: Metadaten relationaler Systeme Record: Metadaten satzorientierter Systeme Multidimensional: Metadaten multidimensionaler Systeme XML: Metadaten von XML-Daten Spring Term 2016 Slide 35

CWM: Layer "Analysis" Transformation: Metadaten über Transformationen (aus Transformationswerkzeugen) OLAP: Metadaten aus OLAP-Werkzeugen Data Mining: Metadaten aus Data Mining-Werkzeugen Information Visualization: Metadaten aus Werkzeugen für die Informationsvisualisierung Business Nomenclature: Metadaten über Business-Taxonomien und -Glossare Spring Term 2016 Slide 36

CWM: Layer "Management" Warehouse Process: Metadaten über DWH-Prozesse Warehouse Operation: Metadaten über DWH-Betrieb (Ergebnisse) Spring Term 2016 Slide 37

CWM: "Multidimensional" Package generische Repräsentation einer multidimensionalen Datenbank Spring Term 2016 Slide 38

CWM: "OLAP" Package generische Repräsentation von OLAP-Konzepten Spring Term 2016 Slide 39