ON SCHEMA DISCOVERY ICDM Renée J. Miller
|
|
- Hilary Hoover
- 6 years ago
- Views:
Transcription
1 ON SCHEMA DISCOVERY ICDM 2011 Renée J. Miller
2 What are Schemas? 2 Schema From the Greek "σχήμα meaning shape, or more generally, plan Structure and constraints the data (should) satisfy Attribute structure Grouping into (nested) tables Constraints Keys, functional dependencies, rules Referential constraints (foreign keys) Domain constraints More general assertions, exclusion constraints, etc. Encode some (important) data semantics
3 Where are Schemas Used? 3 Enterprise Information Designed, curated, and valuable Used in decision making A single source may be massive Foundation for Business Intelligence/Analytics Web/Personal Information Light-weight structure, little/no curation or design Convenient human memory aid Often small, but numerous, data sources Spectrum of information
4 Evolving Role of Schemas 4 Old: Prescriptive Role Time-invariant portion of data Used to ensure data consistency New: Descriptive Role Evolve as data semantics evolves Used to describe, understand, query the data
5 Goals for Talk 5 Overview some of our work on schema discovery Robust and well-studied area for enterprise data Present some new challenges for modern information Unifying theme Leveraging data semantics
6 Outline 6 Motivation Structure & Constraint Discovery Dependency Discovery Constraint Repair Schema Alignment (Data Integration) Schema Mapping Discovery Ontology Alignment Discovery Data Alignment (time permitting) New Challenges posed by Linked Open Data Conclusions and Open Problems
7 Joint Work 7 Series of Papers (including but not exclusively) IEEE DE Bulletin 03 Schema Discovery SIGMOD 04 [Andritsos, Tsaparas, M-] ICDE 06 [Andristos, Fuxman, M-] SIGMOD 07 [Udrea, Getoor, M-] VLDB 08 [Chiang, M-] ICDE 11 [Chiang, M-] Clio [Fagin, Haas, Hernandez, M-, Popa, Velegrakis] Clio: VLDB 00/02 thru to 2009 book chapter Data Exchange: [Fagin, Kolaitis, M-, Popa ICDT03, TCS05] Fei Chiang graduating Spring 2012
8 What is a Good Schema? 8 Traditional Answer: minimize redundancy Redundancy can lead to inconsistency BCNF: all functional dependencies (FDs) are keys Every attribute shall depend on the key, the whole key, and nothing but the key.... so help me Codd Information theoretic characterization Given any set of cells (attribute values) in a table, should not be able to predict the value of another cell If the only constraints allowed are FDs, we can show that BCNF minimizes redundancy [Arenas 06 Dissertation, Kolahi 07 Dissertation on 3NF]
9 Finding Good Schemas? 9 Today, can no longer assume data was designed A table many contain more than one type of entity May be result of integration or information extraction May not have constraints enforced on it Violations of constraints May be errors May be due to evolving (unknown) data semantics Need to discover/maintain schema & constraints
10 Legacy Relations 10 OiD Type Category Start End SalesAssc Time 123 Voice Unlimited 1/20/10 1/20/11 Fred10 10:17: Data Unlimited 4/20/05 3/20/09 Pat11 23:15: Data Weekend 4/20/05 4/20/06 Pat11 01:01: Voice 120HR 2/20/08 2/20/10 Fred10 23:15: Charger Motorola 2/14/10 NULL CRM 10:17: Phone MotoRZ2 12/1/10 NULL CRM 11:15: Phone BlackBerry 12/1/10 NULL CRM 12:12:22 Tables & attributes may become overloaded Service orders and product orders
11 Information Theoretic Clustering 11 We use LIMBO, a scalable algorithm that clusters categorical data [Andritsos, M-, Tsaparas SIGMOD04] Idea: Compress tuples, T, into a clustering C so that the information preserved about the attribute values is maximum [Slonim, Tishby NIPS99] Naturally finds (groups) redundant values LIMBO builds compact summaries that represent the data
12 Finding good horizontal decompositions 12 OiD Type Category Start End SalesAssc Time 123 Voice Unlimited 1/20/10 1/20/11 Fred10 10:17: Data Unlimited 4/20/05 3/20/09 Pat11 23:15: Data Weekend 4/20/05 4/20/06 Pat11 01:01: Voice 120HR 2/20/08 2/20/10 Fred10 23:15: Charger Motorola 2/14/10 NULL CRM 10:17: Phone MotoRZ2 12/1/10 NULL CRM 11:15: Phone BlackBerry 12/1/10 NULL CRM 12:12:22 Merge tuples to lose as little information about attribute values as possible
13 Applications 13 Horizontally decompose legacy tables Group tuples into semantically meaningful types [Andritsos, M-, Tsaparas SIGMOD04] Find potential duplicates tuples in data Entity-identification in relational data [Andritsos, Fuxman, M- ICDE06] Create probabilistic DBs from duplicated data Compute meaningful probabilities that reflect duplication in data [Hassanzadeh, M- VLDB Journal09]
14 Constraint Discovery 14 Functional Dependencies (includes Keys) FDEP [Flach, Savnik AICommunications99], TANE [Huhtala et al. ComputingJ.99], FastFDs [Wyss, Giannella, Roberston DaWaK01] Inclusion Dependencies (includes Foreign Keys) General Inclusion Deps [Bauckmann et al. ICDE07, De Marchi, Lopes, Petit EDBT09] and many others Foreign Keys [Zhang et al. PVLDB10] Mining may give large number of dependencies not always intuitive not all are useful for re-designing the data may be accidental Goal: find interesting dependencies efficiently
15 Conditional Rules 15 Rules may not hold over entire table FDs assume table represents a single entity type Conditional FDs [Maher TCS97, Bohannon et al. ICDE07] Applications to cleaning and understanding data Airline Status Miles Seating Boarding One World Bronze 1x Standard Standard Meridan Silver 1x Elite Standard Skyway Silver 2x Preferred Standard Skyway Gold 2x Elite First One world Gold 3x Preferred Priority Skyway Gold 2x Preferred Priority Functional Dependency [Airline,Status] [Miles] Conditional Functional Dep [Status= Gold,Seating] [Boarding] One World Silver 2x Preferred Priority
16 Discovery of Conditional FDs 16 Modification of TANE search to consider conditioning conditions [Chiang, M- VLDB08] Standard measures of rule interest Support, Entropy, Conviction,... Find approximate conditional FDs and deviants (data that form candidates for cleaning)
17 Inconsistent Data 17 Data may become inconsistent with constraints Options Drop constraints and discover new constraints that fit Assume constraints are correct and repair data [Bohannon et al. SIGMOD05], [Cong et al. VLDB07], [Kolahi, Lakshmanan ICDT09] For FD: X Y, find minimal cost changes to the data (e.g., edit distance) to the Y values Repair data and constraints to fit [Chiang, M- ICDE11]
18 Data & Constraint Repair 18 District Region Municipal AC Street City Prov PCode Brook Granville Glendale 412 Roslin Toronto ON M4N 1Y3 Brook Granville Glendale 412 Roslin Toronto OH M4N 1Y3 Brook Granville Guildwood 553 Sidney Belleville ON K8P 3Y9 Brook Granville Guildwood 553 Sidney Belleville ON K8P 1J7 Fife Parkhill Moore 725 Poth Dundee ON NOB 2E0 Fife Parkhill Moore 725 Roseville Dundee ON NOB 2E0 Fife Parkhill Napa 228 Roslin Toronto ON M4N 1Y3 Municipal F1: [District, Region] [AC] 1) Repair the data or the constraints? 2) How to find the repairs? F2: [PCode] [City, Prov]
19 Key Ideas 19 Use variance of information to select attributes for doing constraint repair Attribute with no variance of information with respect to the inconsistent data is a perfect repair Unified cost model for comparing Constraint Repair with Data Repair
20 Summary Constraint Discovery 20 Goal is to discover or repair constraints to find accurate model of data Can be used to query and understand data Maintain data consistency Clean data or correct data entry errors Semantic query optimization
21 Outline 21 Motivation Constraint Discovery Dependency Discovery Constraint Repair Schema Alignment (Data Integration) Schema Mapping Discovery Ontology Alignment Discovery Data Alignment New Challenges posed by Linked Open Data Conclusions and Open Problems
22 Discovery of Schema Mappings 22 Q S Schema S I S Μ + Q T Μ Μ + I S Q T Schema T I T Schema Mappings are declarative specifications that describe the relationship between a source schema S and a target schema T Key to both Data exchange (creating IT) aka Data Warehousing Query rewriting (creating QS) aka Data Federation
23 Mapping example 23 <a, b, c> <d, e, f> <g, h, i> P Q R source A B C D E F G H I Referential Constraint P(a,b,c) Y Z T(a,Y,Z) Q(d,e,f) X U T(X,e,U) target T A E I R(g,h,i) V W T(V,W,i) a e i a Y 0 Z 0 X 0 e U 0 V 0 W 0 i a e Z 1 V 1 W 1 i There may be many solutions for T (J, J 1, J 2, etc.) However, J seems to be more general J 1 J J 2 X 0, Y 0, Z 0 represent unknown values (or nulls ) Intuitively, J 1 and J 2 have extra information
24 Mapping example 24 Emp <Pat, b, c> Worksin <d, 100K, f> Name B C D Salary F Employee A E I Pat 100K.10 Pat Y 0 Z 0 X 0 100K U 0 V 0 W 0.10 J 1 J h 1 = {Y 0 -> 100K, Z 0 ->.10 } Dept <g, h,.10> G H Bonus Pat 100K Z 1 V 1 W 1.10 J 2 h 2 J 1 and J 2 assert extra information: Pat s salary is 100K this is not required by source or mapping Homomorphisms: h: J J 1 constants h(c) = c J tuples (X,Y,Z), J1 tuple (h(x),h(y),h(z))
25 Universal Solutions 25 Given a data exchange setting (S, T, M, Σ t ) where Σ t are constraints on the target T, and a source instance I, a universal solution is a target instance J: J is a solution for I solutions J for I, there is a homomorphism h: J J For the example, J is a universal solution there are homomorphisms h1: J J1 and h2: J J2 there are no homomorphisms from J1 or J2 J
26 Universal Solutions in Data Exchange 26 We introduced the notion of universal solutions as the best solutions in data exchange The most general solutions Foundational Results [Fagin, Kolaitis, M-, Popa, ICDT03, TCS05] Universal solutions are unique up to homomorphic equivalence; they represent the entire solution space The chase procedure produces a universal solution in polynomial time The certain answers of target conjunctive queries can be obtained by evaluation on an arbitrary universal solution; & universal solutions are only solutions with this property
27 Mapping example 27 <a, b, c> <b, e, f> <f, h, i> P Q R A B C D E F G H I Source constraints can be used to improve mapping d,e,f Q(d,e,f) a,c P(a, d,c) d,e,f Q(d,e,f) h, i R(f,h,i) M P(a,b,c) Y Z T(a,Y,Z) Q(d,e,f) X U T(X,e,U) T A E I R(g,h,i) V W T(V,W,i) J is universal solution for M a Y 0 Z 0 X 0 e U 0 V 0 W 0 i M2: P(a,b,c),Q(b,e,f),R(f,h,i) T(a,e,i) J a e i J1 J1 is universal solution for M2
28 Mapping example 28 <Pat, E1, NY> Emp Worksin <E1, 100K, D1> Dept <D1, CS,.10> Name Eid Addr Eid Salary Did Did Dname Bonus Employee Name Salary Bonus Pat 100K.10 If meaning of Employee table coincides with WorksIn relationship then mapping should be: Emp(Name,Eid,Addr),Worksin(Eid,Salary,Did),Dept (Did,Dname,Bonus) Employee(Name, Salary,Bonus) J 1 is best (universal) solution J 1
29 Schema Mapping Discovery 29 Use chase (logical inference) to infer connections in source and target schemas Potential semantic relationships Guide user in selecting correct semantic relationships to Text use in mapping Transformation code generation SQL, XQuery, XSLT transforms,... Include (skolem) terms to generate labeled nulls Technology in several IBM product lines including IBM s Infosphere Data Architect [M-, Haas, Hernandez VLDB00] thru to [Fagin+2009]
30 Schema Mapping Summary 30 Schema mapping leverages Statistical inference to infer attribute matches Logical inference to give matches a semantic interpretation As inter-schema constraints Can we extend this idea to a closer integration of approaches?
31 Ontology Mapping 31 (discoveredby, owl:inverseof, discoverer); (discoveredby, owl:type, owl:functionalproperty) (discoveredby, owl:inverseof, discoverer); (associatedwith, owl:type, owl:transitiveproperty) (resultsf rom, rdfs:subpropertyof, associatedwith)
32 Example OWL Lite Ontologies 32 An entity can be a: Class (discoveredby, owl:inverseof, discoverer); (discoveredby, owl:type, owl:functionalproperty) (discoveredby, owl:inverseof, discoverer); (associatedwith, owl:type, owl:transitiveproperty) (resultsf rom, rdfs:subpropertyof, associatedwith)
33 Example OWL Lite Ontologies 33 An entity can be a: Class Instance (discoveredby, owl:inverseof, discoverer); (discoveredby, owl:type, owl:functionalproperty) (discoveredby, owl:inverseof, discoverer); (associatedwith, owl:type, owl:transitiveproperty) (resultsf rom, rdfs:subpropertyof, associatedwith)
34 Example OWL Lite Ontologies 34 An entity can be a: Class Instance Property (discoveredby, owl:inverseof, discoverer); (discoveredby, owl:type, owl:functionalproperty) (discoveredby, owl:inverseof, discoverer); (associatedwith, owl:type, owl:transitiveproperty) (resultsf rom, rdfs:subpropertyof, associatedwith)
35 Example OWL Lite Ontologies 35 (discoveredby, owl:inverseof, discoverer) (discoveredby, owl:type, owl:functionalproperty) (associatedwith, owl:type, owl:transitiveproperty) (resultsfrom, rdfs:subpropertyof, associatedwith) Axioms
36 Computing Similarity 36 sim lexical : Jaro-Winkler and Wordnet sim structural : Jaccard for neighborhoods sim extensional : Jaccard on extensions Standard used in schema/ontology matchers [COMA++ & others] parameters: λ x, λ s, λ e different for classes, instances and properties
37 37 Similarity
38 38 Does Similarity agree with Semantics
39 39 Performing logical inference
40 Performing logical inference 40 Candidate Consequence (TheodorEscherich, owl:sameas, T.S. Escherich) is a logical consequence of the candidate (E-ColiPoisoning, owl:sameas, E-Coli)
41 Combining Evidence 41 If logical consequence(s) are similar, then increase similarity of candidate by inference similarity Candidate: pair of entities (ci, cj) asserted to be same (E-coli Poisoning same-as E-coli) similarity 0.5 Consequence: pair of entities (ei, ej) inferred to be same (Theodore same-as T. S.) similarity 0.6 Inference Similarity Greater than one if consequences are similar Less than one otherwise Multiply candidate similarity by inference similarity (E-coli Poisoning same-as E-coli) 0.5*1.5 =.75
42 Experimental Framework 42 ILIADS-tailored uses best set of parameters for each pair of ontologies ILIADS-fixed uses one set of parameters for all pairs of ontologies FCA-merge [Stumme and Maedche, IJCAI 2001] uses formal concept analysis and an external document corpus COMA++ [Aumueller et al., SIGMOD 2005] implements multiple match strategies, including robust collection of similarity functions 30 pairs of real-world ontologies (up to 20,000 triples) From a variety of domains: medical, geographical, economical, biological Ground truth provided by human reviewers
43 Precision/recall for ontologies with substantial instance data 43
44 Ontology Mapping Summary 44 Use logical constraints in schemas/ontologies Able to improve recall substantially over standard statistical inference techniques that are based on lexical, structural, semantic similarity Introduced notion of inference similarity [Udrea, Getoor, M- SIGMOD07]
45 Outline 45 Motivation Constraint Discovery Dependency Discovery Constraint Repair Schema Alignment (Data Integration) Schema Mapping Discovery Ontology Alignment Discovery Data Alignment New Challenges posed by Linked Open Data Conclusions and Open Problems
46 Vision: reconciling data 46 Linked Open Data Publish Web Data with useful semantic information LinQuer Scalable, declarative (native DBMS) support for syntactic and semantic data matching So far leveraging domain constraints (and semantics of domains), but can we do more? Applications to Linked Clinical Trials Work of Oktie Hassanzadeh [Hassanzadeh et al. CIKM09] starting soon at IBM Watson
47 Conclusions 47 Schema Discovery Need to be able to discover and maintain structural and semantic information Role in prescribing and enforcing data consistency Role in cleaning, querying, and understanding data More flexible view of schemas Support what-if style analysis Postulate constraints that match your model of a domain DBMS gives answers that are consistent with constraints If I believe that Customers are uniquely identified by their phone number, zip code, and last name, then how many customers do I have? [Fuxman 2008 ACM SIGMOD Jim Grey Dissertation Award] [Fuxman, Fazli, M- SIGMOD05]
Leveraging Data and Structure in Ontology Integration
Leveraging Data and Structure in Ontology Integration O. Udrea L. Getoor R.J. Miller Group 15 Enrico Savioli Andrea Reale Andrea Sorbini DEIS University of Bologna Searching Information in Large Spaces
More informationBio/Ecosystem Informatics
Bio/Ecosystem Informatics Renée J. Miller University of Toronto DB research problem: managing data semantics R. J. Miller University of Toronto 1 Managing Data Semantics Semantics modeled by Schemas (structure
More informationKanata: Adaptation and Evolution in Data Sharing Systems
Kanata: Adaptation and Evolution in Data Sharing Systems Periklis Andritsos Ariel Fuxman Anastasios Kementsietsidis Renée J. Miller Yannis Velegrakis Department of Computer Science University of Toronto
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization References R&G Book. Chapter 19: Schema refinement and normal forms Also relevant to
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization References R&G Book. Chapter 19: Schema refinement and normal forms Also relevant to this
More informationINCONSISTENT DATABASES
INCONSISTENT DATABASES Leopoldo Bertossi Carleton University, http://www.scs.carleton.ca/ bertossi SYNONYMS None DEFINITION An inconsistent database is a database instance that does not satisfy those integrity
More informationStructural characterizations of schema mapping languages
Structural characterizations of schema mapping languages Balder ten Cate INRIA and ENS Cachan (research done while visiting IBM Almaden and UC Santa Cruz) Joint work with Phokion Kolaitis (ICDT 09) Schema
More informationCSE 562 Database Systems
Goal CSE 562 Database Systems Question: The relational model is great, but how do I go about designing my database schema? Database Design Some slides are based or modified from originals by Magdalena
More informationA Collective, Probabilistic Approach to Schema Mapping
A Collective, Probabilistic Approach to Schema Mapping Angelika Kimmig, Alex Memory, Renée Miller, Lise Getoor ILP 2017 (published at ICDE 2017) 1 Context: Data Exchange & source emp id company 1 Alice
More informationA Unified Model for Data and Constraint Repair
A Unified Model for Data and Constraint Repair Fei Chiang, Renée J. Miller Department of Computer Science, University of Toronto Toronto, Canada {fchiang, miller}@cs.toronto.edu Abstract Integrity constraints
More informationData Integration: Schema Mapping
Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data
More informationRelational Model. Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan
Relational Model DCS COMSATS Institute of Information Technology Rab Nawaz Jadoon Assistant Professor COMSATS IIT, Abbottabad Pakistan Management Information Systems (MIS) Relational Model Relational Data
More informationCS6302- DATABASE MANAGEMENT SYSTEMS- QUESTION BANK- II YEAR CSE- III SEM UNIT I
CS6302- DATABASE MANAGEMENT SYSTEMS- QUESTION BANK- II YEAR CSE- III SEM UNIT I 1.List the purpose of Database System (or) List the drawback of normal File Processing System. 2. Define Data Abstraction
More informationConsistent Query Answering
Consistent Query Answering Opportunities and Limitations Jan Chomicki Dept. CSE University at Buffalo State University of New York http://www.cse.buffalo.edu/ chomicki 1 Integrity constraints Integrity
More informationData Integration: Schema Mapping
Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki
More informationCS 2451 Database Systems: Database and Schema Design
CS 2451 Database Systems: Database and Schema Design http://www.seas.gwu.edu/~bhagiweb/cs2541 Spring 2018 Instructor: Dr. Bhagi Narahari Relational Model: Definitions Review Relations/tables, Attributes/Columns,
More informationSchema Management. Abstract
Schema Management Periklis Andritsos Λ Ronald Fagin y Ariel Fuxman Λ Laura M. Haas y Mauricio A. Hernández y Ching-Tien Ho y Anastasios Kementsietsidis Λ Renée J. Miller Λ Felix Naumann y Lucian Popa y
More informationLearning mappings and queries
Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language
More informationAn Ameliorated Methodology to Eliminate Redundancy in Databases Using SQL
An Ameliorated Methodology to Eliminate Redundancy in Databases Using SQL Praveena M V 1, Dr. Ajeet A. Chikkamannur 2 1 Department of CSE, Dr Ambedkar Institute of Technology, VTU, Karnataka, India 2 Department
More informationData Cleansing. LIU Jingyuan, Vislab WANG Yilei, Theoretical group
Data Cleansing LIU Jingyuan, Vislab WANG Yilei, Theoretical group What is Data Cleansing Data cleansing (data cleaning) is the process of detecting and correcting (or removing) errors or inconsistencies
More informationDatabase Management System
Database Management System Lecture 4 Database Design Normalization and View * Some materials adapted from R. Ramakrishnan, J. Gehrke and Shawn Bowers Today s Agenda Normalization View Database Management
More informationScalable Data Exchange with Functional Dependencies
Scalable Data Exchange with Functional Dependencies Bruno Marnette 1, 2 Giansalvatore Mecca 3 Paolo Papotti 4 1: Oxford University Computing Laboratory Oxford, UK 2: INRIA Saclay, Webdam Orsay, France
More informationDatabase Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 19 Relational Database Design (Contd.) Welcome to module
More informationIntroduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University.
COCS 6421 Advanced Database Systems CSE, York University March 20, 2008 Agenda 1 Problem description Problems 2 3 Open questions and future work Conclusion Bibliography Problem description Problems Why
More informationCS W Introduction to Databases Spring Computer Science Department Columbia University
CS W4111.001 Introduction to Databases Spring 2018 Computer Science Department Columbia University 1 in SQL 1. Key constraints (PRIMARY KEY and UNIQUE) 2. Referential integrity constraints (FOREIGN KEY
More informationTechno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601
Techno India Batanagar Computer Science and Engineering Model Questions Subject Name: Database Management System Subject Code: CS 601 Multiple Choice Type Questions 1. Data structure or the data stored
More informationSchema Refinement: Dependencies and Normal Forms
Schema Refinement: Dependencies and Normal Forms Grant Weddell David R. Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Spring 2012 CS 348 (Intro to
More informationChapter 6: Relational Database Design
Chapter 6: Relational Database Design Chapter 6: Relational Database Design Features of Good Relational Design Atomic Domains and First Normal Form Decomposition Using Functional Dependencies Second Normal
More informationLecture 11 - Chapter 8 Relational Database Design Part 1
CMSC 461, Database Management Systems Spring 2018 Lecture 11 - Chapter 8 Relational Database Design Part 1 These slides are based on Database System Concepts 6th edition book and are a modified version
More informationDATABASE MANAGEMENT SYSTEMS
www..com Code No: N0321/R07 Set No. 1 1. a) What is a Superkey? With an example, describe the difference between a candidate key and the primary key for a given relation? b) With an example, briefly describe
More informationFoundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016
Foundations of Data Exchange and Metadata Management Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 The need for a formal definition We had a paper with Ron in PODS 2004 Back then I was a Ph.D.
More informationComposing Schema Mapping
Composing Schema Mapping An Overview Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Joint work with R. Fagin, L. Popa, and W.C. Tan 1 Data Interoperability Data may reside at several different
More informationDatabase Technology Introduction. Heiko Paulheim
Database Technology Introduction Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager Introduction to the Relational Model
More informationThe Relational Model. Chapter 3. Comp 521 Files and Databases Fall
The Relational Model Chapter 3 Comp 521 Files and Databases Fall 2012 1 Why Study the Relational Model? Most widely used model by industry. IBM, Informix, Microsoft, Oracle, Sybase, etc. It is simple,
More informationFunction Symbols in Tuple-Generating Dependencies: Expressive Power and Computability
Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Georg Gottlob 1,2, Reinhard Pichler 1, and Emanuel Sallinger 2 1 TU Wien and 2 University of Oxford Tuple-generating
More informationA7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS
A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions. 2. PART ONE is to be answered
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationCS2255 DATABASE MANAGEMENT SYSTEMS QUESTION BANK UNIT I
CS2255 DATABASE MANAGEMENT SYSTEMS CLASS: II YEAR CSE SEM:04 STAFF INCHARGE: Mr S.GANESH,AP/CSE QUESTION BANK UNIT I 2 MARKS List the purpose of Database System (or) List the drawback of normal File Processing
More informationChapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases
Chapter 10 Functional Dependencies and Normalization for Relational Databases Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant
More informationThe Relational Model. Chapter 3. Comp 521 Files and Databases Fall
The Relational Model Chapter 3 Comp 521 Files and Databases Fall 2014 1 Why the Relational Model? Most widely used model by industry. IBM, Informix, Microsoft, Oracle, Sybase, MySQL, Postgres, Sqlite,
More informationDATABASE TECHNOLOGY - 1MB025 (also 1DL029, 1DL300+1DL400)
1 DATABASE TECHNOLOGY - 1MB025 (also 1DL029, 1DL300+1DL400) Spring 2008 An introductury course on database systems http://user.it.uu.se/~udbl/dbt-vt2008/ alt. http://www.it.uu.se/edu/course/homepage/dbastekn/vt08/
More informationLectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section
Today s Lecture Lectures 12: Design Theory I Professor Xiannong Meng Spring 2018 Lecture and activity contents are based on what Prof Chris Ré used in his CS 145 in the fall 2016 term with permission 1.
More informationDatabase Applications (15-415)
Database Applications (15-415) The Relational Model Lecture 3, January 18, 2015 Mohammad Hammoud Today Last Session: The entity relationship (ER) model Today s Session: ER model (Cont d): conceptual design
More informationRelational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity
COS 597A: Principles of Database and Information Systems Relational model continued Understanding how to use the relational model 1 with as weak entity folded into folded into branches: (br_, librarian,
More informationCOSC Dr. Ramon Lawrence. Emp Relation
COSC 304 Introduction to Database Systems Normalization Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Normalization Normalization is a technique for producing relations
More informationUNIT I. Introduction
UNIT I Introduction Objective To know the need for database system. To study about various data models. To understand the architecture of database system. To introduce Relational database system. Introduction
More informationUNIT 3 DATABASE DESIGN
UNIT 3 DATABASE DESIGN Objective To study design guidelines for relational databases. To know about Functional dependencies. To have an understanding on First, Second, Third Normal forms To study about
More informationRelational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity
COSC 416 NoSQL Databases Relational Model (Review) Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Relational Model History The relational model was proposed by E. F. Codd
More informationChapter 8: Relational Database Design
Chapter 8: Relational Database Design Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 8: Relational Database Design Features of Good Relational Design Atomic Domains
More informationRelational Design: Characteristics of Well-designed DB
1. Minimal duplication Relational Design: Characteristics of Well-designed DB Consider table newfaculty (Result of F aculty T each Course) Id Lname Off Bldg Phone Salary Numb Dept Lvl MaxSz 20000 Cotts
More informationData about data is database Select correct option: True False Partially True None of the Above
Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another
More informationUnit 3 : Relational Database Design
Unit 3 : Relational Database Design Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Content Relational Model: Basic concepts, Attributes and Domains, CODD's Rules, Relational
More informationInformal Design Guidelines for Relational Databases
Outline Informal Design Guidelines for Relational Databases Semantics of the Relation Attributes Redundant Information in Tuples and Update Anomalies Null Values in Tuples Spurious Tuples Functional Dependencies
More informationCS 338 Functional Dependencies
CS 338 Functional Dependencies Bojana Bislimovska Winter 2016 Outline Design Guidelines for Relation Schemas Functional Dependency Set and Attribute Closure Schema Decomposition Boyce-Codd Normal Form
More informationOverview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer
Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 14 Basics of Functional Dependencies and Normalization for Relational Databases Slide 14-2 Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1 Semantics of the Relation Attributes
More informationThe interaction of theory and practice in database research
The interaction of theory and practice in database research Ron Fagin IBM Research Almaden 1 Purpose of This Talk Encourage collaboration between theoreticians and system builders via two case studies
More informationSchema Refinement: Dependencies and Normal Forms
Schema Refinement: Dependencies and Normal Forms Grant Weddell Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Spring 2016 CS 348 (Intro to DB Mgmt)
More informationDATABASE MANAGEMENT SYSTEM SHORT QUESTIONS. QUESTION 1: What is database?
DATABASE MANAGEMENT SYSTEM SHORT QUESTIONS Complete book short Answer Question.. QUESTION 1: What is database? A database is a logically coherent collection of data with some inherent meaning, representing
More informationFrom ER Diagrams to the Relational Model. Rose-Hulman Institute of Technology Curt Clifton
From ER Diagrams to the Relational Model Rose-Hulman Institute of Technology Curt Clifton Review Entity Sets and Attributes Entity set: collection of things in the DB Attribute: property of an entity calories
More informationFunctional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT
Functional Dependencies & Normalization for Relational DBs Truong Tuan Anh CSE-HCMUT 1 2 Contents 1 Introduction 2 Functional dependencies (FDs) 3 Normalization 4 Relational database schema design algorithms
More informationSchema Refinement and Normal Forms
Schema Refinement and Normal Forms Chapter 19 Quiz #2 Next Wednesday Comp 521 Files and Databases Fall 2010 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 26 Enhanced Data Models: Introduction to Active, Temporal, Spatial, Multimedia, and Deductive Databases 26.1 Active Database Concepts and Triggers Database systems implement rules that specify
More informationThe Relational Model
The Relational Model Grant Weddell David R. Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Spring 2012 CS 348 (Intro to DB Mgmt) Relational Model
More informationConceptual Design. The Entity-Relationship (ER) Model
Conceptual Design. The Entity-Relationship (ER) Model CS430/630 Lecture 12 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Database Design Overview Conceptual design The Entity-Relationship
More informationThe Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model
The Relational Model CMU SCS 15-415 C. Faloutsos Lecture #3 R & G, Chap. 3 Outline Introduction Integrity constraints (IC) Enforcing IC Querying Relational Data ER to tables Intro to Views Destroying/altering
More informationGraph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation
Graph Databases Guilherme Fetter Damasio University of Ontario Institute of Technology and IBM Centre for Advanced Studies Outline Introduction Relational Database Graph Database Our Research 2 Introduction
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationRelational Design Theory. Relational Design Theory. Example. Example. A badly designed schema can result in several anomalies.
Relational Design Theory Relational Design Theory A badly designed schema can result in several anomalies Update-Anomalies: If we modify a single fact, we have to change several tuples Insert-Anomalies:
More informationRelational Database Design (II)
Relational Database Design (II) 1 Roadmap of This Lecture Algorithms for Functional Dependencies (cont d) Decomposition Using Multi-valued Dependencies More Normal Form Database-Design Process Modeling
More informationCS425 Fall 2016 Boris Glavic Chapter 1: Introduction
CS425 Fall 2016 Boris Glavic Chapter 1: Introduction Modified from: Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Textbook: Chapter 1 1.2 Database Management System (DBMS)
More informationChapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)
Chapter 10 Normalization Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information in Tuples and Update Anomalies 1.3 Null
More informationDistributed Database Systems By Syed Bakhtawar Shah Abid Lecturer in Computer Science
Distributed Database Systems By Syed Bakhtawar Shah Abid Lecturer in Computer Science 1 Distributed Database Systems Basic concepts and Definitions Data Collection of facts and figures concerning an object
More informationDatabase Design and Tuning
Database Design and Tuning Chapter 20 Comp 521 Files and Databases Spring 2010 1 Overview After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for
More informationThe Relational Model. Why Study the Relational Model? Relational Database: Definitions
The Relational Model Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Microsoft, Oracle, Sybase, etc. Legacy systems in
More informationDatabase Management
Database Management - 2011 Model Answers 1. a. A data model should comprise a structural part, an integrity part and a manipulative part. The relational model provides standard definitions for all three
More informationCPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery
CPSC 421 Database Management Systems Lecture 19: Physical Database Design Concurrency Control and Recovery * Some material adapted from R. Ramakrishnan, L. Delcambre, and B. Ludaescher Agenda Physical
More informationA Deeper Look at Data Modeling. Shan-Hung Wu & DataLab CS, NTHU
A Deeper Look at Data Modeling Shan-Hung Wu & DataLab CS, NTHU Outline More about ER & Relational Models Weak Entities Inheritance Avoiding redundancy & inconsistency Functional Dependencies Normal Forms
More informationApplied Databases. Sebastian Maneth. Lecture 5 ER Model, normal forms. University of Edinburgh - January 25 th, 2016
Applied Databases Lecture 5 ER Model, normal forms Sebastian Maneth University of Edinburgh - January 25 th, 2016 Outline 2 1. Entity Relationship Model 2. Normal Forms Keys and Superkeys 3 Superkey =
More informationRelational Database design. Slides By: Shree Jaswal
Relational Database design Slides By: Shree Jaswal Topics: Design guidelines for relational schema, Functional Dependencies, Definition of Normal Forms- 1NF, 2NF, 3NF, BCNF, Converting Relational Schema
More informationChapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies
Chapter 14 Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies Copyright 2012 Ramez Elmasri and Shamkant B. Navathe Chapter Outline 1 Informal Design Guidelines
More informationERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution Leopoldo Bertossi Carleton University School of Computer Science Institute for Data Science Ottawa, Canada bertossi@scs.carleton.ca
More informationChapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query
More informationChapter 4. The Relational Model
Chapter 4 The Relational Model Chapter 4 - Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations and relations in the relational model.
More informationChapter 1: Introduction. Chapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases
More informationNormalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design
1 Normalization What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update Normalization uses concept of dependencies Functional
More informationElmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2 Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationMaanavaN.Com DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK
CS1301 DATABASE MANAGEMENT SYSTEM DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK Sub code / Subject: CS1301 / DBMS Year/Sem : III / V UNIT I INTRODUCTION AND CONCEPTUAL MODELLING 1. Define
More informationA Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange
1. Problem and Motivation A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange Laura Chiticariu and Wang Chiew Tan UC Santa Cruz {laura,wctan}@cs.ucsc.edu Data exchange is
More informationDBAI-TR UMAP: A Universal Layer for Schema Mapping Languages
DBAI-TR-2012-76 UMAP: A Universal Layer for Schema Mapping Languages Florin Chertes and Ingo Feinerer Technische Universität Wien, Vienna, Austria Institut für Informationssysteme FlorinChertes@acm.org
More informationThe strategy for achieving a good design is to decompose a badly designed relation appropriately.
The strategy for achieving a good design is to decompose a badly designed relation appropriately. Functional Dependencies The single most important concept in relational schema design theory is that of
More informationHigh Level Database Models
ICS 321 Fall 2011 High Level Database Models Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 9/21/2011 Lipyeow Lim -- University of Hawaii at Manoa 1 Database
More informationLise Getoor, University of Maryland Renée J. Miller, University of Toronto
Lise Getoor, University of Maryland Renée J. Miller, University of Toronto From Webster. Main Entry: align ment Variant(s): also aline ment \əlīn-mənt\ Function: noun Date: 1790 1: the act of aligning
More informationCOMP718: Ontologies and Knowledge Bases
1/35 COMP718: Ontologies and Knowledge Bases Lecture 9: Ontology/Conceptual Model based Data Access Maria Keet email: keet@ukzn.ac.za home: http://www.meteck.org School of Mathematics, Statistics, and
More informationThe Relational Model. Chapter 3
The Relational Model Chapter 3 Why Study the Relational Model? Most widely used model. Systems: IBM DB2, Informix, Microsoft (Access and SQL Server), Oracle, Sybase, MySQL, etc. Legacy systems in older
More informationData Exchange: Semantics and Query Answering
Data Exchange: Semantics and Query Answering Ronald Fagin Phokion G. Kolaitis Renée J. Miller Lucian Popa IBM Almaden Research Center fagin,lucian @almaden.ibm.com University of California at Santa Cruz
More informationB.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1
Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished
More informationNORMALISATION (Relational Database Schema Design Revisited)
NORMALISATION (Relational Database Schema Design Revisited) Designing an ER Diagram is fairly intuitive, and faithfully following the steps to map an ER diagram to tables may not always result in the best
More informationFunctional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)
1 / 13 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant d Information in Tuples and Update Anomalies 1.3 Null Values in Tuples 1.4 Spurious Tuples
More information