Edinburgh Research Explorer

Similar documents
Consistent Query Answering: Opportunities and Limitations

Consistent Query Answering

arxiv:cs/ v1 [cs.db] 5 Apr 2002

INCONSISTENT DATABASES

Preference-Driven Querying of Inconsistent Relational Databases

Static, Incremental and Parameterized Complexity of Consistent Query Answering in Databases Under Cardinality-Based Semantics

Consistent Query Answering

Edinburgh Research Explorer

Query Answering in Peer-to-Peer Data Exchange Systems

Computing Query Answers with Consistent Support

On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases

Database Consistency: Logic-Based Approaches

Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies

Consistent Query Answering in Databases

Logic Programs for Consistently Querying Data Integration Systems

arxiv: v1 [cs.db] 23 May 2016

The Semantics of Consistency and Trust in Peer Data Exchange Systems

Minimal-Change Integrity Maintenance Using Tuple Deletions

Aalborg Universitet. Just-In-Place Information for Mobile Device Interfaces Kjeldskov, Jesper. Published in: Paterno, F. (ed.)

Data integration lecture 3

Scalar Aggregation in Inconsistent Databases

The multi-perspective process explorer

RELATIONAL STORAGE FOR XML RULES

Locally unique labeling of model elements for state-based model differences

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Optimized encodings for Consistent Query Answering via ASP from different perspectives

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

From Database Repair Programs to Consistent Query Answering in Classical Logic (extended abstract)

Inconsistency-tolerant logics

Managing Inconsistencies in Collaborative Data Management

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W.

Declarative Semantics of Production Rules for Integrity Maintenance

Integrating decision management with UML modeling concepts and tools

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

Relational Databases

Approximation Algorithms for Computing Certain Answers over Incomplete Databases

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University.

On the Hardness of Counting the Solutions of SPARQL Queries

Forgetting and Compacting data in Concept Learning

XES Software Communication Extension

Features of the architecture of decision support systems

Relational Storage for XML Rules

Published in: Mobile Wireless Middleware, Operating Systems, and Applications - Workshops

Pattern Recognition Using Graph Theory

Developing Seamless Discovery of Scholarly and Trade Journal Resources Via OAI and RSS Chumbe, Santiago Segundo; MacLeod, Roddy

Exploiting Unfounded Sets for HEX-Program Evaluation

Trajectory Compression under Network Constraints

Answer Sets and the Language of Answer Set Programming. Vladimir Lifschitz

Visual support for work assignment in YAWL

Realisation of Active Multidatabases by Extending Standard Database Interfaces

JOURNAL OF OBJECT TECHNOLOGY

Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints

Computing Query Answers with Consistent Support

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Consistent Answers from Integrated Data Sources

Process Model Consistency Measurement

Edinburgh Research Explorer

Consistent Query Answering for Atemporal Constraints over Temporal Databases

WoPeD - A "Proof-of-Concept" Platform for Experimental BPM Research Projects

Edinburgh Research Explorer

Towards Rule Learning Approaches to Instance-based Ontology Matching

ANDREAS PIERIS JOURNAL PAPERS

Multiplex: Integrating. Autonomous, Heterogeneous and Inconsistent Information Sources

Constructing Control Flow Graph for Java by Decoupling Exception Flow from Normal Flow

Clean Living: Eliminating Near-Duplicates in Lifetime Personal Storage

Building petabit/s data center network with submicroseconds latency by using fast optical switches Miao, W.; Yan, F.; Dorren, H.J.S.; Calabretta, N.

Results of Using an Efficient Algorithm to Query Disjunctive Genealogical Data

Minimal Test Cost Feature Selection with Positive Region Constraint

ACCOn: Checking Consistency of XML Write-Access Control Policies

Provable data privacy

Query Rewriting Using Views in the Presence of Inclusion Dependencies

Ontology Merging: on the confluence between theoretical and pragmatic approaches

Detecting Logical Errors in SQL Queries

Static checking of domain constraints in applications interacting with relational database by means of dependently-typed lambda calculus

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications,

Web Services for Relational Data Access

An Extension of Complexity Bounds and Dynamic Heuristics for Tree-Decompositions of CSP

RaDON Repair and Diagnosis in Ontology Networks

Rule Enforcement with Third Parties in Secure Cooperative Data Access

Online Conformance Checking for Petri Nets and Event Streams

Automatic Detection and Repair of Errors in Data Structures

Fuzzy Set-Theoretical Approach for Comparing Objects with Fuzzy Attributes

Database Repairing and Consistent Query Answering

Getting state-space models from FEM simulations

Relational terminology. Databases - Sets & Relations. Sets. Membership

Uncertainty in Databases. Lecture 2: Essential Database Foundations

An Evolution of Mathematical Tools

Horizontal Aggregations for Mining Relational Databases

Data integration supports seamless access to autonomous, heterogeneous information

Learning Probabilistic Ontologies with Distributed Parameter Learning

CS317 File and Database Systems

Scalable Web Service Composition with Partial Matches

Web Security Vulnerabilities: Challenges and Solutions

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury

Chapter 15: Transactions

On the Role of Integrity Constraints in Data Integration

A Framework for the Internationalization of Data-Intensive Web Applications

The Data Organization

Efficient SQL-Querying Method for Data Mining in Large Data Bases

A Tool for Storing OWL Using Database Technology

Transcription:

Edinburgh Research Explorer Hippo: A System for Computing Consistent Answers to a Class of SQL Queries Citation for published version: Chomicki, J, Marcinkowski, J & Staworko, S 2004, Hippo: A System for Computing Consistent Answers to a Class of SQL Queries. in Advances in Database Technology - EDBT 2004: 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, March 14-18, 2004, Proceedings. vol. 2992, Springer Berlin Heidelberg, pp. 841-844. DOI: 10.1007/978-3-540-24741-8_53 Digital Object Identifier (DOI): 10.1007/978-3-540-24741-8_53 Link: Link to publication record in Edinburgh Research Explorer Document Version: Peer reviewed version Published In: Advances in Database Technology - EDBT 2004 General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately and investigate your claim. Download date: 08. Dec. 2018

Hippo: a System for Computing Consistent Answers to a Class of SQL Queries Jan Chomicki 1, Jerzy Marcinkowski 2, and Slawomir Staworko 1 1 Dept. Computer Science and Engineering, University at Buffalo {chomicki,staworko}@cse.buffalo.edu 2 Instytut Informatyki, Wroclaw Uniwersity, Poland Jerzy.Marcinkowski@ii.uni.wroc.pl 1 Motivation and Introduction Integrity constraints express important properties of data, but the task of preserving data consistency is becoming increasingly problematic with new database applications. For example, in the case of integration of several data sources, even if the sources are separately consistent, the integrated data can violate the integrity constraints. The traditional approach, removing the conflicting data, is not a good option because the sources can be autonomous. Another scenario is a long-running activity where consistency can be violated only temporarily and future updates will restore it. Finally, data consistency may be neglected because of efficiency or other reasons. In [1] Arenas, Bertossi, and Chomicki have proposed a theoretical framework for querying inconsistent databases. Consistent query answers are defined to be those query answers that are true in every repair of a given database instance. A repair is a consistent database instance obtained by changing the given instance using a minimal set of insertions/deletions. Intuitively, consistent query answers are independent of the way the inconsistencies in the data would be resolved. This conservative definition of consistent answers has one shortcoming: the number of repairs. Even for a single functional dependency, the number of repairs can be exponential in the number of tuples in the database [3]. Nevertheless, several practical mechanisms for the computation of consistent query answers without computing all repairs have been developed (see [5] for a survey): query rewriting [1], logic programs [2, 4, 9], and compact representations of repairs [6, 7]. The first is based on rewriting the input query Q into a query Q such that the evaluation of Q returns the set of consistent answers to Q. This method works only for SJD 3 queries in the presence of universal binary constraints. The second approach uses disjunctive logic programs to specify all repairs, and then with the help of a disjunctive LP system [8] finds the consistent answers to a given query. Although this approach is applicable to very general queries in the 3 When describing a query class, P stands for projection, S for selection, U for union, J for cartesian product, and D for difference.

presence of universal constraints, the complexity of evaluating disjunctive logic programs makes this method impractical for large databases. 2 The System Hippo The system Hippo is an implementation of the third approach. All information about integrity violations is stored in a conflict hypergraph. Every hyperedge connects the tuples violating together an integrity constraint. Using the conflict hypergraph, we can find if a given tuple belongs to the set of consistent answers without constructing all repairs [6]. Because the conflict hypergraph has polynomial size, this method has polynomial data complexity and allows us to efficiently deal even with large databases [7]. Currently, our application computes consistent answers to SJUD queries in the presence of denial constraints (a class containing functional dependency constraints and exclusion constraints). Allowing union in the query language is crucial for being able to extract indefinite disjunctive information from an inconsistent database. Future work includes the support for restricted foreign key constraints, universal tuplegenerating dependencies and full PSJUD 4 queries. However, because computing consistent query answers for SPJ queries is co-np-data-complete [3, 6], polynomial data complexity cannot be guaranteed once projection is allowed. The whole system is implemented in Java as an RDBMS frontend. Hippo works with any RDBMS that can execute SQL queries, and provides a JDBC access interface (we use PostgreSQL). The data stored in the RDBMS needs not be altered. The flow of data in Hippo is presented on Figure 1. Before processing any input query, the system performs Conflict Detection and creates Conflict Hypergraph for further usage. We are assuming that the number of conflicts is small enough for the hypergraph to be stored in main memory. The only output of this system is the Answer Set consisting of the consistent answers to the input Query in the database instance DB with respect to a set of integrity constraints IC. The processing of the Query starts from Enveloping. As a result of this step we get a query defining Candidates (candidate consistent query answers). This query subsequently undergoes Evaluation by the RDBMS. For every tuple from the set of candidates, the system uses Prover to check if the tuple is a consistent answer to the Query. Depending on the result of this check, the tuple is either added to the Answer Set or not. For every tuple that Prover processes, several membership checks have typically to be performed. In the base version of the system this is done by simply executing the appropriate membership queries on the database. This is a costly procedure and it has a significant influence on the overall time performance of the system. We have introduced several optimizations addressing this problem. In general, by modifying the expression defining the envelope (the set of candidates) the optimizations allow us to answer the required membership checks 4 Currently, our application supports only those cases of projection that don t introduce existential quantifiers in the corresponding relational calculus query.

Query Enveloping IC Evaluation DB Conflict Detection Candidates Conflict Hypergraph Prover Answer Set Fig. 1. Data flow in Hippo without executing any queries on the database. Also, using an expression selecting a subset of the set of consistent query answers, we can significantly reduce the number of tuples that have to be processed by Prover. A more detailed description of those techniques can be found in [7]. 3 Demonstration The presentation of the Hippo system will consist of three parts. First, we will demonstrate that using consistent query answers we can extract more information from an inconsistent database than in the approach where the input query is evaluated over the database from which the conflicting tuples have been removed. Secondly, we will show the advantages of our method over competing approaches by demonstrating the expressive power of supported queries and integrity constraints. And finally, we will compare the running times of our approach and the query rewriting approach, showing that our approach is more efficient. For

every query being tested, we will also measure the execution time of this query by the RDBMS backend (it corresponds to the approach when we ignore the fact that the database is inconsistent). This will allow us to conclude that the time overhead of our approach is acceptable. References 1. M. Arenas, L. Bertossi, and J. Chomicki. Consistent Query Answers in Inconsistent Databases. In ACM Symposium on Principles of Database Systems (PODS), pages 68 79, 1999. 2. M. Arenas, L. Bertossi, and J. Chomicki. Answer Sets for Consistent Query Answering in Inconsistent Databases. Theory and Practice of Logic Programming, 3(4 5):393 424, 2003. 3. M. Arenas, L. Bertossi, J. Chomicki, X. He, V. Raghavan, and J. Spinrad. Scalar Aggregation in Inconsistent Databases. Theoretical Computer Science, 296(3):405 434, 2003. 4. P. Barcelo and L. Bertossi. Logic Programs for Querying Inconsistent Databases. In International Symposium on Practical Aspects of Declarative Languages (PADL), pages 208 222. Springer-Verlag, LNCS 2562, 2003. 5. L. Bertossi and J. Chomicki. Query Answering in Inconsistent Databases. In J. Chomicki, R. van der Meyden, and G. Saake, editors, Logics for Emerging Applications of Databases. Springer-Verlag, 2003. 6. J. Chomicki and J. Marcinkowski. Minimal-Change Integrity Maintenance Using Tuple Deletions. Technical Report cs.db/0212004, arxiv.org e-print archive, December 2002. Under journal submission. 7. J. Chomicki, J. Marcinkowski, and S. Staworko. Computing Consistent Query Answers Using Conflict Hypergraphs. In preparation. 8. T. Eiter, W. Faber, N. Leone, and G. Pfeifer. Declarative Problem-Solving in DLV. In J. Minker, editor, Logic-Based Artificial Intelligence, pages 79 103. Kluwer, 2000. 9. G. Greco, S. Greco, and E. Zumpano. A Logical Framework for Querying and Repairing Inconsistent Databases. IEEE Transactions on Knowledge and Data Engineering, 15(6):1389 1408, 2003.