Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation April 29, 2002 2002 Microsoft Corp. 1
The Problem ithere is 30 years of DB Research on meta data But we don t have great infrastructure to offer Most design tools and web services store meta data in files, not DBs OODBMS s are not a huge success Most meta data driven tools use their own infrastructure Goal: generic meta data manipulation infrastructure Reduce the amount of programming required to build meta data driven applications. Proposal Define an algebra to manipulate meta data in large chunks, called models and mappings. 2002 Microsoft Corp. 2
Outline Overview of Model Management Solutions to classical meta data problems Recent technical results 2002 Microsoft Corp. 3
Models and Mappings Model a complex information structure XML schema, SQL schema, OO interface, UML model, web site map, make script,. Mapping a representation of a transformation from one model into another Map between two XML schemas Map a SQL schema to an XML schema Map data sources to a data warehouse Map an ER diagram to a SQL schema Map a process defn to a workflow script 2002 Microsoft Corp. 4
Representation A model is a directed graph with one root. Relational Schema Emp map 1 Emp XSD E# E# Dept# Dept# Name Name A mapping is a model each of whose nodes connects nodes of two other models First Last 2002 Microsoft Corp. 5
Model Management Algebra Match Merge Compose Diff Select Enumerate ApplyFunction Copy Invert Update operations 2002 Microsoft Corp. 6
Match Match(M 1, M 2 ) returns the best mapping between M 1 and M 2 M 1 M 2 Emp map1 E# = Dept# = Name» Addr Emp E# Dept# Phone Name First Last 2002 Microsoft Corp. 7
Merge(M 1, M 2, map) Return the union of models M 1 and M 2 Use map to guide the Merge If elements x = y in map, then collapse them into one element Emp mapc Emp Emp Addr Name = Name Phone Addr Name Phone 2002 Microsoft Corp. 8
Left Composition ( f ) Emp mapa Emp mapb Emp Addr a1 Name b1 Name Street a2 Street b2 StAddr City a3 City b3 Town M 1 M 2 M 3 Emp mapc Emp Addr c1 mapc = mapa f mapb Street c2 StAddr City c3 Town 2002 Microsoft Corp. 9
Model Management Algebra map = Match (M 1, M 2 ) M 3 = Merge (M 1, M 2, map) map 3 = Compose(map 1, map 2 ) M 2 = Diff(M 1, map) M 2 = Select(M 1, pred) list = Enumerate(M) ApplyFunction(M, f ) M 2 = Copy(M 1 ) Update operations They re generic = data model independent well implemented on an extended ER model with an extensibility story 2002 Microsoft Corp. 10
Example Given map 1 from SQL schema rdb1 to xsd1, xsd2, which is similar to xsd1 Produce a map between xsd2 and a relational schema. xsd1 1. map 2 xsd2 1. map2 = Match(xsd1, xsd2) map 1 rdb1 2. map 3 3. map 4 rdb2 2. map 3 = map 1 map 2 3. <map< 4, rdb2 > = Copy(map 3 ) 4. Use ApplyFunction(map 4 ) to map each x in Diff(xsd2,map 4 ) into rdb2 2002 Microsoft Corp. 11
Theme Classic meta data problems can be solved using Model Management operations Schema integration Schema evolution Data migration Reverse engineering Published solutions to these problems help us produce generic implementations of model mgmt operations 2002 Microsoft Corp. 12
Outline Overview of Model Management Solutions to classical meta data problems Schema integration Schema evolution Reverse engineering Data migration Recent technical results 2002 Microsoft Corp. 13
Schema Integration Given two view schemas, V 1 and V 2 Produce an integrated schema, S S V 1 1. map V 2 1. map= = Match(V 1, V 2 ) 2. S S = Merge(V 1, V 2, map) 2. 3. ApplyFunction(S ) ) // to resolve conflicts in S, producing S 2002 Microsoft Corp. 14
V 1 Emp map Emp V 2 S E# Dept# Addr Name Emp = =» E# Dept# Phone FirstName LastName E# Dept# Addr Phone f L FirstName Name LastName FirstName R LastName 1. map= = Match(V 1, V 2 ) 2. S S = Merge(V 1, V 2, map) 3. Use ApplyFunction(S ) to re- solve conflicts, producing S 2002 Microsoft Corp. 15
Schema Evolution Given map SV from schema S to view V a modified version S of S Produce a mapping map S V from S to V (i.e. a view defn for V over S ). V 1. map S S = Match(S,, S) map SV S 2. map S V 1. map S S S 2. map S V = map S S map SV 3. Use ApplyFunction(V) to delete elements not derivable from S 2002 Microsoft Corp. 16
Outline Overview of Model Management Solutions to classical meta data problems Schema integration Schema evolution Reverse engineering Data migration Recent technical results 2002 Microsoft Corp. 17
Reverse Engineering Given Model M (e.g., an ER model) Model G (e.g., SQL) generated via map MG from M A modified version G of G Produce A modified version M of M that generates G M map MG G 2. map MG 1. map GG M 3. map M G G 1. map GG = Match(G, G ) G 2. map MG = map MG map GG 2002 Microsoft Corp. 18 GG 3. <M, map G M > = Copy(map MG 4. Use ApplyFunction(map M G ), to reverse engineer each g in Diff(G,map M G ) into M MG )
Data Migration Given a schema S and its database D an evolved schema S Produce a procedure for mapping D into an S database D Enum Generate Migration Script D Run S 1. map SS S D 1. map SS = Match(S, S ) S 2. Use Enum(S) to generate a data migration script 2002 Microsoft Corp. 19
Data Translation Like data migration, except S and S are expressed in different data models. 2002 Microsoft Corp. 20
Outline Overview of Model Management Solutions to classical meta data problems Recent technical results 2002 Microsoft Corp. 21
Status Report Vision [Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00] Data Warehouse Examples [Bernstein & Rahm, ER 00] Match Operation Survey: [Rahm & Bernstein, VLDB Journal, 12/01] Prototype: [Madhavan, Bernstein, & Rahm, VLDB 01] Merge Operation coming soon Theory [Alagic & Bernstein, DBPL 01] 2002 Microsoft Corp. 22
Schema Matching Approaches About a dozen published algorithms. Many good ideas, but none are robust. Individual matchers Combined matchers Schema-based Content-based Hybrid Composite Per-Element Linguistic Constraint -based Names Descriptions Types Keys Structural Constraint -based Graph matching Per-Element Linguistic Constraint -based IR (word Value pattern frequencies, and ranges key terms) Manual composition Automatic composition 2002 Microsoft Corp. 23
The Cupid Algorithm Computes linguistic similarity of element pairs Computes structural similarity of element pairs Generates a mapping PO PurchaseOrder POShipTo POBillTo DeliverTo InvoiceTo Address Address City Street City Street City Street City Street ssim++ 2002 Microsoft Corp. 24
Merge(M 1, map, M 2, M 3 ) [Buneman, Davidson, Kosky, EDBT 92] Meta-model has aggregation & generalization only Do a union and collapse objects having the same name Fix-up step for inconsistencies created by merging a X a X a X a Y X Z a Y Z Y Z W Successive fixups lead to different results Batch them at the end, to produce a unique minimal result Now enrich the meta-model (containment, complex mappings, ) & merge semantics (conflicts, deletes) 2002 Microsoft Corp. 25
Customer Scheduled Delivery Order Salesperson Produc Update Marketing Bill Customer Inventory Authorize Credit Schedule Delivery Order Entry Implementation Vision Model-Driven UI Generator cust emp dept dno dna select all Generic Tools Browser Import/export Scripting Editors Catalogs Model Manager Match Merge Apply Compose Copy Operation Specializations Object-Oriented Repository MM Meta-Model OR Mapper Inferencing Engine " & ^ $ SQL DBMS 2002 Microsoft Corp. 26
Related Work There s a lot of it. Apply it to model management! Platforms OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic) Inferencing on mappings AQUV, description logic Transitive closure and recursive QP Differencing text, trees, graphs Data translation algebras, schema evolution Data integration schema match, view generation 2002 Microsoft Corp. 27
Summary Raise the level of abstraction of meta-data programming by using: models and mappings as objects an algebra that manipulates models and mappings on a generic meta-model Classical meta data problems can be expressed using this algebra Implementations of classic problems offer guidance on implementing the algebra 2002 Microsoft Corp. 28
References http://www.research.microsoft.com/~philbe P. Bernstein & E. Rahm, Data Warehouse Scenarios for Model Management, ER 2000 Conference P. Bernstein, A. Levy, R. Pottinger, A Vision for Management of Complex Models, SIGMOD Record, Dec. 2000 E. Rahm, P. Bernstein, On Matching Schemas Automatically, VLDB Journal, Dec. 01. J. Madhavan, P. Bernstein, E. Rahm, Generic Schema Matching with Cupid, VLDB 2001 S. Alagic, P. Bernstein, A Model Theory for Generic Schema Management, DBPL 2001 2002 Microsoft Corp. 29